The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

A Scheme library for the true RM

PreviousPage 2 of 3Next
Quote from dandl on June 7, 2019, 3:15 pm

(a) As I said, the key features of TTM are (1) the unification of the type system between database and programming language, (2) the representation of headings as generated tuple and relation types, (3) the representation of relations as variables and (4) the implicit scoping of open expressions. If you abandon those, it would seem you should say what you intend will replace them.  I know Python well and Dee somewhat. My recollection is that it makes little or no effort to deal with any of these.

(b) Lambdas are easy, open expressions are not, at least in the way they interact with the type system. But I guess if you don't have relation or tuple types the point is rather moot.

(a) I don't intend to deal with many of them either, except that Dee has no private relvars and I do, though they cannot be unified with host-language variables.  Scheme like Python has no declared types for variables, only types for values (most-specific-types in versions of Scheme with inheritance), so (a)(1), (a)(2), and (a)(3) are moot.  (Note:  This is the American English sense of moot 'not worth discussing' rather than the British-and-elsewhere sense 'open for discussion'.)

(a)(4) and (b) are about automatically detecting from context when an expression actually produces an ordinary scalar value (as (+ 3 4) does) and when it produces a function with an argument.  This has a distinct spelling in Scheme: (lambda (x) (+ x 4)).  Note that this is not the same as (+ 3 x), which uses the variable x in the local scope rather than the lambda-bound argument.  All this is exactly like Python.  These considerations are exactly the same for any language with nested functions right back to Pascal (although Pascal does not have anonymous nested functions, which are different only in expressiveness, not in power).

Quote from johnwcowan on June 7, 2019, 5:00 pm

(a) I don't intend to deal with many of them either, except that Dee has no private relvars and I do, though they cannot be unified with host-language variables.  Scheme like Python has no declared types for variables, only types for values (most-specific-types in versions of Scheme with inheritance), so (a)(1), (a)(2), and (a)(3) are moot.  (Note:  This is the American English sense of moot 'not worth discussing' rather than the British-and-elsewhere sense 'open for discussion'.)

I said nothing about typed variables. A variable is a name which refers to a value, and TTM unifies the form of those names across function parameters, local variables, open expression variables, public relvars and a couple of kinds of pseudovariables. There is nothing lost by dynamically typing local variables, but the types used for persisting relational values in a database must be durable from one program invocation to another and assignment may be disallowed or have different consequences in each case.

No, those two sense of moot are not US oddities, they are both well known to the OED. Often (as in my usage) both senses are intended.

(a)(4) and (b) are about automatically detecting from context when an expression actually produces an ordinary scalar value (as (+ 3 4) does) and when it produces a function with an argument.  This has a distinct spelling in Scheme: (lambda (x) (+ x 4)).  Note that this is not the same as (+ 3 x), which uses the variable x in the local scope rather than the lambda-bound argument.  All this is exactly like Python.  These considerations are exactly the same for any language with nested functions right back to Pascal (although Pascal does not have anonymous nested functions, which are different only in expressiveness, not in power).

No, recognising an open expression is trivial from the context, it's binding names to the correct values that is the hard part. The Tutorial D document has quite a bit on the subject. But perhaps if you produce a more complete sample of your Scheme code all will become clearer.

Andl - A New Database Language - andl.org
Quote from johnwcowan on June 5, 2019, 4:20 pm

I've been thinking a lot about the true RM in the interest of implementing a library for it in Scheme (a dialect of Lisp), somewhat similar to the Python implementation Dee.  In Scheme, everything is first-class except variables and syntax extensions (macros).  That aside, however, Scheme is a dynamically typed impure functional programming language: that is, assignment to a variable is permitted but discouraged, and compilers typically rewrite assignable variables to be bound to a mutable box object that holds the underlying value.  As such it is almost, but not quite, entirely unlike D.

Hi John, here's some detailed point-by-points. Firstly regarding your approach: I'm all in favour of adding D-ness to an existing language, rather than trying to use an existing language to write an interpreter/compiler/translator from some stand-alone D (or DSEL D) across to some execution environment. As such, I find the TTM document rather unhelpful: what are essential semantics for any RM? vs what are the authors' prejudices about what counts as a programming language or good language design

RM PRE 26:  I'll do my best.

I think you should aim to follow Scheme's style, and not worry about the prejudices of authors who seem to see merit in SQL as a language (despite their protestations) or PL/1.

For a lot of your remarks/points where you're digressing from TTM as written, I'm unclear whether you're saying:

  • that's hard to do in Scheme, you're not going to try at first;
  • that's contrary to the semantics/style of Scheme;
  • you've a philosophical objection about TTM's view of the RM.

For Scheme purposes, a relvar can't be a variable, because given a Scheme variable's name all you can get is its value, ...

?so is Scheme dramatically different to LISP? From what I recall (a long time ago), variables have potentially many properties, of which VALUE is only one.

whereas relvars have several other properties beside their relation value:  a name, a list of keys, a list of constraints, ...

Schema information is part of a relation's type. (And there I mean relation value including inferred keys, not just relvar.) In Higher-Order Typed languages, I would use phantom types.

possibly other things like a physical database reference.

Metadata about how relvars are implemented can go in the catalogue (RM Pre 25). No need to make it part of the language as such.

After muddling on it for a while, I came to the conclusion that a relvar for Scheme has to be a mutable object whose properties are the things mentioned above.  A Scheme variable can then be bound (typically but not necessarily immutably) to a relvar object.  The procedures (functions) that operate on relations must also be made to operate on relvars, which is easy thanks to dynamic typing.  (This introduces the possibility of relations that contain relvars, but I intend to sweep that under the rug: do that at your own risk.  The same is true of relations that contain first-class functions.)

It's essential to be able to add/drop/change the type of attributes for a relvar, as usual operations within the language. (That makes static compilation hard; which is why I earlier suggested using a monad for type-changing effects.)

I currently think the pr{e,o}scriptions will be satisfied except for the following (some knock-on consequences affect other 'scriptions, but just how is not stated here):

RM PRE 2: No update operators, at least not yet; only relational assignment (not the same as Scheme variable assignment, but rather mutating the "relational value" property of a relvar).

?Update operators is RM Pre 3 (points d., e.) The effect of SQL's INSERT, DELETE, UPDATE is typically treated as an update operator in TTM. The relation value of a relvar is a set; update operators are implemented as set operations with assignment of the result; you'll need those set operations anyway to be expressively complete (RM Pre 18); I'm not seeing the difficulty.

RM PRE 5:  No update operators for components of scalar types either.

?Doesn't Scheme have some way to access components of a data structure?

RM PRE 8:  Scheme types have equivalence procedures that are more fine-grained than type-specific equality; use them with relations at your own risk.

I'm unclear if you're saying you're not going to support equality testing between relation values? Again, it's just a set operation: IS_EMPTY((R1 NOT MATCHING R2) UNION (R2 NOT MATCHING R1)). If you  can't do that, there's something wrong.

RM PRE 22:  I see no objection to supporting < and > for tuples in the sense that T1 < T2 if T1's attributes are a proper subset of T2's and the values of those attributes are equal.

Wrong. If T1, T2 have a different Heading, they're different type, so not comparable. You're implementing Subtyping-by-Extension. Definitely ruled out of TTM  because there's better ways to do it whilst keeping within the RM. If you want to compare tuples like that, use tuple-level (RM Pre 20) [NOT] MATCHING.

For relations, I think R1 < R2 if R1's attributes are a proper subset of R2's, or R1 and R2 have the same attributes and R1's tuples are a subset of R2's.  I would like comment on this.

Wrong. For the same reason as comparing tuples of different type. I think your definition would also make the < operator pathological: non-transitive. (Using it as a subset test between relation values of the same Heading is OK.)

RM PRE 23: I intend to support only key constraints, not arbitrary database constraints, at least for public relvars.

Wrong. You at least must support  Inclusion Dependencies (aka Foreign Key constraints). That's a vital requirement for data structuring.

RM PRO 4:  Here I intend to introduce something bold.  Scheme has a Maybe type generator based on Haskell's (its name is Option in ML, Scala, and Java) but dynamically typed.  For example: Maybe Integer is the type of all objects that are either Just integers or Nothing.  This is different from NULL because Just 5 (e.g.) is not an integer and cannot be treated as one: it has to be unwrapped in a way that respects the possibility that it is Nothing. Integers remain integers and have no extra element NULL.  My sense is that while this violates the letter of RM PRO 4, it respects the spirit of it.  Comments welcome.

This exposes that the authors of TTM are plain out of touch with modern language design. The Maybe type is an example of a tagged union. Any D should support those, before (and IMO in preference to) TTM's nutty Inheritance Model.

The semantics for a Maybe type does not correspond to/mimic the semantics of SQL's NULL, despite there being an army of programmers (some on this forum) who think it does. I suggest you just shut up about RM Pro 4.

It also allows outer joins to be safely introduced:  if R1 and R2 have attributes named A, B, C and A, X, Y respectively, all of type Integer, then A OUTER_JOIN B returns a relation with all five attributes, but only A retains the type Integer, whereas the others have the type Maybe Integer.  The tuples that an ordinary (inner) join would produce have Just Integer values, whereas the remaining tuples from both R1 and R2 have Nothing values.  The definitions of LEFT_OUTER_JOIN and RIGHT_OUTER_JOIN follow.

If you've introduce a Maybe type generator, you're entitled to introduce relational operators with that result type. Perhaps you'd like to explain what the result of invoking such an operator means, in terms of the meanings of its operands? And then the meaning of JOINing some other relation to that result? (That enormous hole in SQL's NULL gibberish you haven't plugged.) Calling that an OUTER_JOIN is only going to invite debate.

OO PRE 1:  Scheme is dynamically typed, though some implementations have non-standard exceptions that declare the types of both values and variables and can check them at compile time.

OO PRE 2:  No type inheritance in standard Scheme, though implementations provide it as an extension.

TTM has an idiosyncratic opinion on type inheritance. I would steer clear.

OO PRO 4, 5:  I have not yet resolved how best to implement transactions, but I will conform to these prescriptions when I do.

OO PRE 6:  I intend to introduce a general fold over relations, which requires the user rather than the system to provide an identity element.  Fold(x, +, 0> folds + over the sequence x as follows:  + is invoked on the first element of x and the identity 0 to produce intermediate result r, then + is invoked again on the next element and the result, and so on until the sequence is exhausted.  A zero-length sequence immediately returns 0, so for a sequence of length n, exactly n calls to + are made.  This generalizes to string concatenation and all sorts of other things: fold (left fold in Scheme) is almost a totipotent operator.

RM VSS 2, 4, 5: I intend to provide only key constraints and type constraints, at least for public relvars.

OO VSS 3:  My design will not be single-level.

Another constraint I am imposing: in implementations conforming to this design (including mine), scalars in public relvars MAY be limited to just numbers, strings of arbitrary length, single characters (which may be indistinguishable from strings of length 1), booleans, bytevectors (blobs), and the corresponding Maybe types.  Furthermore, lists will be forbidden in public relvars and possibly arrays as well.  These restrictions make certain implementation techniques easier.

I have also been thinking more speculatively about how to use a SQL DBMS as a back-end (this is distinct from providing access to existing tables).  For my purposes, each relvar is stored in one table, except that in dynamically typed DBMSes like SQLite the attribute type names have to be kept elsewhere.  Columns for non-Maybe types are marked NOT NULL, whereas for Maybe types they are marked NULL.

I can't see that's going to work. The semantics for Maybe are not the semantics for NULL, see above. So that'll preclude using the SQL engine for queries -- unless you include a lot of code bloat testing ISNULL( ) with special handling.

The restrictions on scalars above mean that even low-conforming DBMSes can be used.  I think it will suffice to use the SQL system just to join and restrict, with everything else done in-memory.

What about comparing or JOINing a Nullable column to a non-Nullable column of the same based-on type?

(One issue is that a first-class function in Scheme used for restriction can't be "decompiled" into a SQL expression.  There may have to be two distinct representations of restriction functions.)

Look at Appendix A: restriction/WHERE is just a fancy way to spell JOIN. Also look at Hall, Hitchcock, Todd 1975: an 'Algorithmic relation' is a way to represent an anonymous function (they say lambda calculus is in the back of their minds). OTOH I hope Scheme's LAMBDA is rather better-principled than LISP's.

Quote from AntC on June 8, 2019, 6:48 am

Hi John, here's some detailed point-by-points.

Thank you very much for the thorough response.  (Public notice:  AntC and I know each other from another place.)

that's hard to do in Scheme, you're not going to try at first;
that's contrary to the semantics/style of Scheme;
you've a philosophical objection about TTM's view of the RM.

Any or all of these, plus "I'm heavily overloaded, and when specifying it is better to specify too little than too much, because you can always add things to a standard" (the curse of compatibility).

So is Scheme dramatically different to LISP? From what I recall (a long time ago), variables have potentially many properties, of which VALUE is only one.

Scheme variables, like Common Lisp local variables, have only a value and are typically compiled away as in most programming languages.  Common Lisp global variables and the variables of Elisp (and other archaic Lisps)  generally do have other properties that are visible (by reflection, as it were).

Schema information is part of a relation's type. (And there I mean relation value including inferred keys, not just relvar.)

A very good point, but I think I will omit key inference for now in favor of only being able to explicitly add and remove key specifications from a relvar.  (But see below.)

Metadata about how relvars are implemented can go in the catalogue (RM Pre 25). No need to make it part of the language as such.

It's primarily a convenience: if a relvar in memory knows where it lives in the database, it can be written back more easily with just (relvar-write relvar).

It's essential to be able to add/drop/change the type of attributes for a relvar, as usual operations within the language. (That makes static compilation hard; which is why I earlier suggested using a monad for type-changing effects.)

There's a problem in my current design:  you can replace a relvar object's value with any relation, but that will invalidate the meaning of the constraints.  I will think on this further.  I want a relation to stay a pure value, a heading + set of tuples as the RM demands.

Doesn't Scheme have some way to access components of a data structure?

Yes, and even mutate them (though the accessors and mutators can be non-existent or not exposed to other libraries).  But an update operator as D has them seems to be akin to passing an (ordinary) variable by reference, which is abhorrent to Scheme and many other languages.

I'm unclear if you're saying you're not going to support equality testing between relation values?

No, I'm saying that Scheme (like other imperative languages) has a notion of identity distinct from equality, because not all its objects are values.  This is == vs. equals in Java, or is vs. == in Python, for example.  You shouldn't expect identity, which may be false even when equality is true, to be preserved across writing into a database and reading back from it, as it is not part of the RM (what it calls identity is in this sense equality, but it does not matter since values are immutable).

If T1, T2 have a different Heading, they're different types, so not comparable.

Agreed, though I may provide such an operator on headings if I make them first-class.

For the same reason as comparing tuples of different type. I think your definition [of comparison operators on tuples] would also make the < operator pathological: non-transitive.

Scheme has historically done this sometimes: the superset and subset operators are spelled set<= and set>=, with a warning about the trichotomy law being broken.  But perhaps I won't follow that precedent here.

(Using it as a subset test between relation values of the same Heading is OK.)

Good, though the trichotomy law is again broken, since two sets may be neither subsets nor supersets nor equal.

You at least must support  Inclusion Dependencies (aka Foreign Key constraints). That's a vital requirement for data structuring.

Agreed.  As I said, I had thought (wrongly) that foreign-key relationships were part of "key constraints".  I certainly plan to have them.

If you've introduce a Maybe type generator, you're entitled to introduce relational operators with that result type. Perhaps you'd like to explain what the result of invoking such an operator means, in terms of the meanings of its operands? And then the meaning of JOINing some other relation to that result?

The central point is that a Maybe Int is a type completely disjoint from Int.  You cannot join on two attributes of the same name of which one is Maybe and the other isn't, any more than you can join on a string attribute vs. an int attribute.  What a Maybe type means in some broader sense, 'unknown' or 'inapplicable' or 'erroneous',  I leave up to users and philosophers.  An idea I just had is to have a standard unary relation operation which, given an attribute of type Maybe T, returns a relation omitting all tuples in which that attribute is Nothing and where the type of the attribute of the same name in the result is T.  I think this is Imielinski lifting, but perhaps not.

I can't see that's going to work. The semantics for Maybe are not the semantics for NULL, see above. So that'll preclude using the SQL engine for queries -- unless you include a lot of code bloat testing ISNULL( ) with special handling.

What about comparing or JOINing a Nullable column to a non-Nullable column of the same based-on type?

The library will never generate SQL that tries to make such a join: see above.

Look at Appendix A: restriction/WHERE is just a fancy way to spell JOIN. Also look at Hall, Hitchcock, Todd 1975: an 'Algorithmic relation' is a way to represent an anonymous function (they say lambda calculus is in the back of their minds).

Haven't read either one yet, so I hope for future enlightenment.

OTOH I hope Scheme's LAMBDA is rather better-principled than LISP's.

Both Scheme and Common Lisp have had proper lexical closures since their beginnings in the 1970s.  Older Lisp's lambdas (including Elisp before the last few versions) did not, because of their broken dynamically scoped variables.

Haskell calls this operator on lists catMaybes, which is stupid because it doesn't catenate anything.

Quote from johnwcowan on June 9, 2019, 2:16 am

Haskell calls this operator on lists catMaybes, which is stupid because it doesn't catenate anything.

Huh? catMaybes :: [Maybe a] -> [a] https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Maybe.html#v:catMaybes

It catenates the Just x's into a List of x's, pruning out the Nothings.

Quote from johnwcowan on June 8, 2019, 3:58 pm

Doesn't Scheme have some way to access components of a data structure?

Yes, and even mutate them (though the accessors and mutators can be non-existent or not exposed to other libraries).  But an update operator as D has them seems to be akin to passing an (ordinary) variable by reference, which is abhorrent to Scheme and many other languages.

That's why I suggested a monad as the abstraction for updating. (And I'm only responding to this point because of the following point horror.) TTM's update to a component is a shorthand for updating the top-level variable. Plus the requirement that structures be look-through; not OOP objects/encapsulated/opaque. Since you seem to be familiar with Haskell/other FP; TTM requires components be accessible obeying the vanLaarhoven Lens laws.

I'm unclear if you're saying you're not going to support equality testing between relation values?

No, I'm saying that Scheme (like other imperative languages) has a notion of identity distinct from equality, because not all its objects are values.  This is == vs. equals in Java, or is vs. == in Python, for example.  You shouldn't expect identity, which may be false even when equality is true, to be preserved across writing into a database and reading back from it, as it is not part of the RM (what it calls identity is in this sense equality, but it does not matter since values are immutable).

Eek. Oh dear. I didn't realise that's what you're talking about. TTM requires value semantics; not identity/reference/object equality semantics. You might need to use Scheme's identity testing under the covers in your implementation. That must not get exposed in the surface D/D-ness. All sorts of fundamental relational operations (like JOIN) are defined in terms of value equality. This must work

    v1 := SP WHERE QTY > 100

    v2 := SP MINUS (SP WHERE QTY <= 100)

    -- write v1, v2 to database; fetch them back

    v1 == v2   -- must return TRUE

If you've introduce a Maybe type generator, you're entitled to introduce relational operators with that result type. Perhaps you'd like to explain what the result of invoking such an operator means, in terms of the meanings of its operands? And then the meaning of JOINing some other relation to that result?

The central point is that a Maybe Int is a type completely disjoint from Int.  You cannot join on two attributes of the same name of which one is Maybe and the other isn't, any more than you can join on a string attribute vs. an int attribute.  What a Maybe type means in some broader sense, 'unknown' or 'inapplicable' or 'erroneous',  I leave up to users and philosophers.

OK. I'll leave you to blunder into it for yourself. I did try.

I can't see that's going to work. The semantics for Maybe are not the semantics for NULL, see above. So that'll preclude using the SQL engine for queries -- unless you include a lot of code bloat testing ISNULL( ) with special handling.

What about comparing or JOINing a Nullable column to a non-Nullable column of the same based-on type?

The library will never generate SQL that tries to make such a join: see above.

Then your users will need to lift a column to a Maybe/NULLable column. And comparing them (in the SQl engine) will use SQL's nutty NULL-to-NULL comparison, which'll return a different result vs a Nothing-to-Nothing comparison in Scheme. Good luck explaining that.

(This is really just another symptom of the incoherence of meaning you get from NULL. See D&D's writings anon.)

 

 

Quote from AntC on June 8, 2019, 6:48 am

[catMaybe] catenates the Just x's into a List of x's, pruning out the Nothings.

To my mind it's more like binding (flatmapping) than like catenation.  There's an alternative possibility with type [Maybe a] -> a -> [a], which uses a default value for Nothings rather than stripping them.  In any case, I am complaining about the lack of a good name, not the lack of a suitable operator.

TTM's update to a component is a shorthand for updating the top-level variable.

Schemers are not used to having such a shortcut, and my library won't provide one.  They expect either object mutation (which will break this library) or call by value operators plus explicit assignment.

v1 := SP WHERE QTY > 100

v2 := SP MINUS (SP WHERE QTY <= 100)

-- write v1, v2 to database; fetch them back

v1 == v2 -- must return TRUE

I don't dispute any of that.  But RM PRE 8 says (paraphrased) "If there exists an operator by which two values are distinguishable, those values must be unequal in the sense of =".  Scheme (like Lisps generally) already has an operator which can distinguish equal values: object identity, aka intrinsic identity, Leibnizian identity, identity of indiscernibles, the sense in which I, who am mutable, can be said to be identical to my previous self even though my properties differ.  (There is also an even finer predicate that might be called "appearance identity", a performance hack when even testing object identity is too costly.) In a world of pure values like D, these would be distinctions without differences, but that's not Scheme as a whole. The mere existence of these operators breaks RM PRE 8 (this part of which is really a PRO), so don't use them anywhere near my library.  That's all I'm saying.

I can't see that's going to work. The semantics for Maybe are not the semantics for NULL, see above. So that'll preclude using the SQL engine for queries -- unless you include a lot of code bloat testing ISNULL( ) with special handling.

Then your users will need to lift a column to a Maybe/NULLable column. And comparing them (in the SQL engine) will use SQL's nutty NULL-to-NULL comparison, which'll return a different result vs a Nothing-to-Nothing comparison in Scheme. Good luck explaining that.

Quite so.  The only reasons for using a SQL engine are for ACID (which a NoSQL database could also provide) and to minimize (not eliminate) the amount of unwanted data passing over the network between my library and the back end.   That means we restrict, join, and sort on the back end when feasible.  When using SQL operations isn't feasible because of semantic mismatch, particularly around NULLs, we don't do them.  That's also why I don't attempt to interoperate with random tables created outside my library.

Thus the NULL and NOT NULL information in the SQL catalogue only tells us when we have a Maybe type vs. a non-Maybe type so that we don't have to persist that information separately, just as the type name information (which is present even in SQLite, which barely uses it) allows us to maintain attribute type names, provided attribute types in public relvars are restricted as I wrote earlier.

 

 

A couple of things I've thought out:

On relvar assignment:  In order to preserve the meaning of constraints, it's necessary to have a dynamic-typing equivalent of the static typing rules.  What it amounts to is that when assigning a new relation to a relvar object, the system must verify dynamically that the header of the existing relation and new relation are equal before completing the assignment. (In addition, of course, the constraints currently present in the relvar must be checked.)  This means that every relvar must be initialized with a relation, even if it has cardinality zero.

This leads to another idea, which is to identify types with domain constraints.  A type, in the context of my library, is a pair <p, name>, where name is the name of the type and p is a predicate that returns TRUE on elements of the type.  There is no requirement that types be disjoint.  In addition, the type is associated with the dynamic analogues of Haskell type classes Eq, Ord, and perhaps Hash: that is, we know for each type how the = and < operators work and how a hash code for each element of the type is generated (which must be consistent with =).

I've also realized that I'll need update, delete, and insert for public relvars in the name of efficiency, though in private relvars they can be implemented without special logic.  I haven't worked out the details yet.

In addition, I'm also thinking about how to use SQL tables to store relation columns rather than whole relations, but I don't have anything definite yet.

 

I've decided for the purposes of my library not to bother with relvar objects, and simply have predicates that check constraints on a relation (or two relations in the case of foreign keys).

PreviousPage 2 of 3Next