The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

A tuple operation: UPDATE t1 FROM t2

RM Pre 6 says "... [a D] shall include operators analogous to the RENAME, project, EXTEND, and JOIN operators of the relational algebra". Here's an operator for tuples that isn't analogous to operations on relations. I'm thinking it's more useful over tuple vars/values within a D program than on relvars/relations. I'll give its syntax/semantics first; then motivate it; then link my musings back to previous threads (especially the 'Java gets tuples'). Of course the syntax/keywords are for illustration only/in Tutorial D style.

  • Example (these are more likely to be tuple vars than literals)
    T4 := UPDATE TUP{ S# 'S1', P# 'P2', QTY 3 } FROM TUP{ QTY 4 };
    Result: T4 == TUP{ S# 'S1', P# 'P2', QTY 4 }
  • General form UPDATE t1 FROM t2, in which
    t1, t2 tuples; heading of t2 must be a subset of the heading of t1.
  • Behaviour: return a tuple with the same heading as t1, taking values from t2 for those attributes in common, otherwise values from t1.

Note this isn't symmetrical, it isn't a JOIN or a merge or append. I guess it might be called OVERWRITE ... FROM ..., but that carries nasty overtones of destructive update.

Tutorial D is one of the few languages fully supporting 'first-class', anonymous, named-perspective, free standing tuples. (Many languages support some subset of those features.) That is:

  • TUPLE is first-class per RM Pre 6; I don't need to explain that further here.
  • 'Anonymous' (somewhat misleading, but that's the term) in that I don't need to pre-declare the type (its heading), I can throw together any old attribute names and types. Contrast the Java tuples proposal (very similar to Haskell) you have to pre-declare the choice of fields and their type, and give the structure (the class) a name. ("Anonymous" is misleading because as per RM Pre 6 "the name of that type shall be, precisely, TUPLE H.")
  • Named-perspective because the only way to access components is via attribute name. Contrast the Java proposal exposes field-position when you create/initialise a new instance; Haskell similarly, what's worse there's back-door ways to access by position.
  • Free-standing in that a TUPLE{ } generator can appear anywhere in an expression. Many languages support similar, typically with parens: ( 'S1', 'P2', 3 ), but that's positional access only; and very prone to mis-ordering with no type-checking if there's just a sequence of numbers or Strings. Arguably a typical argument-list falls in the same category: foo( 'S1', 'P2', 3).

It's that use as argument-lists that's giving me particular pain at the moment working with C/C++ code. I'm working with many data structures (represented as pointers) paired with offsets into them or parallel structures. But in C I can't represent them even as tuple pairs (p1, o1) -- which is what I'd do in Haskell. Furthermore there's many procedure calls with two such pairs to compare/validate one against the other. With foo(p1, o1, p2, o2) the type checker won't detect if I've got them out of sequence (they're all just pointers). I'd rather write r := foo{givenp = p1, giveno = o1, newp = p2, newo = o2}, in which foo takes one argument that's a tuple, rather than four arguments, and I pass the whole tuple around the program, structured by the attribute names. (I'm thinking the TUP{ } Selector is redundant there.)

The tuple UPDATE idiom isn't instead of relvar UPDATE -- that should still be a set-at-a-time operation, not RBAR. Rather, I am thinking about typical CRUD maintenance/screen dialogue where the code fetches each screen field into a distinct variable, then:

UPDATE S WHERE (S# = 'S7') : { NAME := namevar, STATUS := statusvar, CITY := cityvar };

With the typical tens of fields coming from the screen, it's too easy to mix up those variables. Instead have a single tuple var in the program holding those changes together (possibly built up from multiple tuple EXTENDs) and a shorthand form:

UPDATE S WHERE (S# = 'S7') : * FROM changesvar;

in which * denotes the 'current tuple' in S, updated from the attributes named in changesvar.

Similarly useful would be a

EXTEND TUPLE t1 : t2  // rather than EXTEND TUPLE t1 : { STATUS := statusvar, CITY := cityvar }

In which t2 == TUP{STATUS statusvar, CITY cityvar}.

t1, t2 tuples with disjoint headings; union the two.

Edit: mmm I guess that's tuple t1 TIMES t2 aka JOIN.

Motivation

It's not very onerous to declare a named record type in the post-C-family of languages. But for a temporary 'holding slot' you don't need the overhead of turning the structure into a class, furthermore it typically starts as just a few elements, then gets bigger through program maintenance/application feature creep. So a full-blown class/declared structure is a little too onerous; and then its content grows until the possible positional confusion becomes a point of pain. Code needs a more ad-hoc but still well-typed mechanism. There's a similar story with arguments to functions: if there's only a couple of args, a declared structure for a handful of calls seems overkill; but then there's a need for an extra arg, and another, ...

Musings

This thought grew out of a dissatisfaction with that proposal for 'Java gets tuples'; and because Haskell already has pretty-much that same functionality (actually with somewhat richer features) -- which is so expressively limiting that Haskellers are continually coming up with improvements. But even Java's streamlined declarations have too much 'ceremony'. Furthermore if a language has first-class, anonymous, etc tuples, a declared data structure could be just a naming wrapper around a tuple, with consistent access methods. So

TYPE SCHANGES IS TUP{S# S#, SNAME NAME, STATUS INT, CITY CHAR};

TYPE Point IS TUP{xCoord RAT, yCoord RAT};

Then I seem to have done without TTM Selectors. Similarly in OOP languages, or Haskell, I've done without data constructors. (Not really: the TUP is a Selector, the attribute names are tantamount to Selectors.) Of course I could add an explicit/named Selector/constructor as a wrapper, but why another level of indirection?

TYPE Point IS Point( TUP{xCoord RAT, yCoord RAT} );

I said "Tutorial D is one of the few languages fully supporting 'first-class', anonymous, ... tuples"; there's another language: an experimental feature in an old Haskell compiler Hugs 'TRex' = Typed Records with extensibility. When I first looked at it, it seemed like just another variant of the Haskell records standard. Because it's (ahem) 'thinly' documented, and includes extra developments that never made it into the academic papers, I didn't at first realise the significance of that last point above about not needing a Selector/data constructor wrapper. So I've actually been writing code like:

TYPE SCHANGES ALIAS TUP{S# S#, SNAME NAME, STATUS INT, CITY CHAR};

TYPE Point ALIAS TUP{xCoord RAT, yCoord RAT};

(Haskell has a feature whereby you can declare an atomic type name as alias for a type expression; then the name behaves as just another type, can be used in a signature for an operator, etc, etc.) Without the Selector/data constructor in the way, you can write polymorphic code over record structures with specialisation-by extension: convert to Polar the Point 'within' any record -- that is, for any TUP{ } including xCoord, yCoord. That's helped by the ability in TRex to say: get these fields out of the tuple, bind the leftover fields into another variable -- which is just some ad-hoc tuple type that can be glued back when you want to return a result.

So assuming some version of tuple NOT MATCHING (*) has been defined as, say,

t1 NOT MATCHING t2 : return type is the [tuple type corresponding to the] "projection" of the heading of t1 onto the non-common attributes, return value is the projection of t1 itself onto those non-common attributes.  "Common or non-common" may be defined as "merely the attribute name needs be the same" but also as "attribute name identical and corresponding attribute types in both tuples are union-compatible".  Deliberately leaving that open here.

, that would be the read-only operator that returns

(t1 NOT MATCHING t2) UNION t2 ?

Fair enough.  There will probably be many more of such operators for which it seems convenient to have these operators in the language because they make the ***programming*** more elegant and easier.  TTM was not written primarily with that goal in mind (I mean TTM was written specifically with the data manipulation portion of programming in mind), and has left that field (of useful tuple-level operators) largely unexplored, presumably for that very reason.

Quote from Erwin on April 4, 2020, 12:41 pm

So assuming some version of tuple NOT MATCHING (*) has been defined as, say,

Thanks Erwin, but I think we'd want to call that REMOVE, per Appendix A's operator that implements projection {ALL BUT ...}.

t1 NOT MATCHING t2 : return type is the [tuple type corresponding to the] "projection" of the heading of t1 onto the non-common attributes, return value is the projection of t1 itself onto those non-common attributes.  "Common or non-common" may be defined as "merely the attribute name needs be the same" but also as "attribute name identical and corresponding attribute types in both tuples are union-compatible".  Deliberately leaving that open here.

, that would be the read-only operator that returns

(t1 NOT MATCHING t2) UNION t2 ?

(t1 REMOVE t2)TIMES t2, I think -- that is, trying to keep the analogue with the relational operators of that name, per RM Pre 6. With a side condition t2's heading be a subset of t1's.

Fair enough.  There will probably be many more of such operators for which it seems convenient to have these operators in the language because they make the ***programming*** more elegant and easier.  TTM was not written primarily with that goal in mind (I mean TTM was written specifically with the data manipulation portion of programming in mind), and has left that field (of useful tuple-level operators) largely unexplored, presumably for that very reason.

Yes I was thinking of the programming convenience, rather than the DML part.

It's difficult to think of many relational operators, per RM Pre 6 that have unproblematic analogues for tuple operations: UNION would return a tuple only if the operands were the same tuple; JOIN (mentioned in RM Pre 6) only if the headings were disjoint (then it's TIMES) or if the operands had the same value for attributes in common. Tuple [NOT]MATCHING would return either its left operand or what?

And yet ... every time I think I've nailed down all possible programming-convenient tuple operations, and boiled them down to a few primitives, I come up with counter-examples. So working on Haskell/Hugs/TRex gives me a possibility to express operations in a Field-by-agonising-Field idiom. Hmmm.

Quote from AntC on April 4, 2020, 10:51 pm
Quote from Erwin on April 4, 2020, 12:41 pm

So assuming some version of tuple NOT MATCHING (*) has been defined as, say,

Thanks Erwin, but I think we'd want to call that REMOVE, per Appendix A's operator that implements projection {ALL BUT ...}.

t1 NOT MATCHING t2 : return type is the [tuple type corresponding to the] "projection" of the heading of t1 onto the non-common attributes, return value is the projection of t1 itself onto those non-common attributes.  "Common or non-common" may be defined as "merely the attribute name needs be the same" but also as "attribute name identical and corresponding attribute types in both tuples are union-compatible".  Deliberately leaving that open here.

, that would be the read-only operator that returns

(t1 NOT MATCHING t2) UNION t2 ?

(t1 REMOVE t2)TIMES t2, I think -- that is, trying to keep the analogue with the relational operators of that name, per RM Pre 6. With a side condition t2's heading be a subset of t1's.

Fair enough.  There will probably be many more of such operators for which it seems convenient to have these operators in the language because they make the ***programming*** more elegant and easier.  TTM was not written primarily with that goal in mind (I mean TTM was written specifically with the data manipulation portion of programming in mind), and has left that field (of useful tuple-level operators) largely unexplored, presumably for that very reason.

Yes I was thinking of the programming convenience, rather than the DML part.

It's difficult to think of many relational operators, per RM Pre 6 that have unproblematic analogues for tuple operations: UNION would return a tuple only if the operands were the same tuple; JOIN (mentioned in RM Pre 6) only if the headings were disjoint (then it's TIMES) or if the operands had the same value for attributes in common. Tuple [NOT]MATCHING would return either its left operand or what?

And yet ... every time I think I've nailed down all possible programming-convenient tuple operations, and boiled them down to a few primitives, I come up with counter-examples. So working on Haskell/Hugs/TRex gives me a possibility to express operations in a Field-by-agonising-Field idiom. Hmmm.

See a relational tuple as the set of (attrname attrvalue) pairs that it is (and a tuplevar as a "setvar" of such set type).  Disregard entirely the role tuples play in the relation algebra (and specifically get JOIN and cartesian product out of your head for a minute).

Now it should be clear that :

  • the only fundamental operations we need to do anything we'd need to do to a tuplevar are INSERT and DELETE (completely analogous to the only two operations we need for manipulating relvars), possibly "simultaneous" as in MA (if your tuplevars could be subject to declared constraints that could prohibit the INSERTS/DELETES to be done separately and in sequence), and possibly in both "lax" and "non-lax" form.
  • the only fundamental operations we should be needing to "compute" with tuple values are analogs to UNION MINUS INTERSECT from set algebra.

Instinctively I'd always interpret REMOVE as DELETE and place it in the operations-to-variables league.  (I've always found that UPDATE read-only operator a hopeless misnomer and hopelessly confusing for that reason.)

The issue with DELETE/MINUS is that its operation needs only a set of attribute names as the "second" argument, it doesn't need a full-blown tuple so if you really want to descend to bare roots, then that's how you'd need to define it, and now "set of attribute names" must also become a first-class citizen (or at least very very close) of your language.  Also, there's the question here of "lax" vs. "non-lax" (or both).

INTERSECT could be thought of in two flavours : one operating on two tuples, and returning only the common (attrname attrvalue) pairs, another one operating on one tuple and one set of attribute names (thus giving "retain only these attrs" semantics).  And you once again have the "lax" and "non-lax" thing here.

UNION (your TIMES) should, in addition to just doing the set union of the pairs, always check that attribute names are still unique and identifying in the result tuple.

ADDIT

INTERSECT might not be strictly needed, I just realized it should remain equivalent to ((A UNION B) MINUS (A MINUS B)) MINUS (B MINUS A), but you'd need to figure out the "lax vs non-lax" question for each of those four MINUSes, and you might want that "exception" raised by UNION to not occur here, e.g. when taking the intersection of the unary tuples (NAME 2) and (NAME 5).  And you probably want this INTERSECT to exist for the ability to tack its specific optimal evaluation strategy onto the construct, rather than having to recognize the longhand for the shorthand.

Quote from Erwin on April 5, 2020, 11:41 am
Quote from AntC on April 4, 2020, 10:51 pm
Quote from Erwin on April 4, 2020, 12:41 pm

So assuming some version of tuple NOT MATCHING (*) has been defined as, say,

Thanks Erwin, but I think we'd want to call that REMOVE, per Appendix A's operator that implements projection {ALL BUT ...}.

t1 NOT MATCHING t2 : return type is the [tuple type corresponding to the] "projection" of the heading of t1 onto the non-common attributes, return value is the projection of t1 itself onto those non-common attributes.  "Common or non-common" may be defined as "merely the attribute name needs be the same" but also as "attribute name identical and corresponding attribute types in both tuples are union-compatible".  Deliberately leaving that open here.

, that would be the read-only operator that returns

(t1 NOT MATCHING t2) UNION t2 ?

(t1 REMOVE t2)TIMES t2, I think -- that is, trying to keep the analogue with the relational operators of that name, per RM Pre 6. With a side condition t2's heading be a subset of t1's.

Fair enough.  There will probably be many more of such operators for which it seems convenient to have these operators in the language because they make the ***programming*** more elegant and easier.  TTM was not written primarily with that goal in mind (I mean TTM was written specifically with the data manipulation portion of programming in mind), and has left that field (of useful tuple-level operators) largely unexplored, presumably for that very reason.

Yes I was thinking of the programming convenience, rather than the DML part.

It's difficult to think of many relational operators, per RM Pre 6 that have unproblematic analogues for tuple operations: UNION would return a tuple only if the operands were the same tuple; JOIN (mentioned in RM Pre 6) only if the headings were disjoint (then it's TIMES) or if the operands had the same value for attributes in common. Tuple [NOT]MATCHING would return either its left operand or what?

And yet ... every time I think I've nailed down all possible programming-convenient tuple operations, and boiled them down to a few primitives, I come up with counter-examples. So working on Haskell/Hugs/TRex gives me a possibility to express operations in a Field-by-agonising-Field idiom. Hmmm.

See a relational tuple as the set of (attrname attrvalue) pairs that it is (and a tuplevar as a "setvar" of such set type).  Disregard entirely the role tuples play in the relation algebra (and specifically get JOIN and cartesian product out of your head for a minute).

No. I strongly dislike that line of thought. Unlike relations as sets-of-tuples, sets-of-AV-pairs are not homogenous: each attribute type is different potentially. So the TTM heading for a tuple has different impact vs its role with relations. And for the operations you describe below (none of which do I like), each would have to check that the resulting tuple value is a TTM tuple -- that is, that the set of A-T pairs is a heading.

[Rest left in]

Now it should be clear that :

  • the only fundamental operations we need to do anything we'd need to do to a tuplevar are INSERT and DELETE (completely analogous to the only two operations we need for manipulating relvars), possibly "simultaneous" as in MA (if your tuplevars could be subject to declared constraints that could prohibit the INSERTS/DELETES to be done separately and in sequence), and possibly in both "lax" and "non-lax" form.
  • the only fundamental operations we should be needing to "compute" with tuple values are analogs to UNION MINUS INTERSECT from set algebra.

Instinctively I'd always interpret REMOVE as DELETE and place it in the operations-to-variables league.  (I've always found that UPDATE read-only operator a hopeless misnomer and hopelessly confusing for that reason.)

The issue with DELETE/MINUS is that its operation needs only a set of attribute names as the "second" argument, it doesn't need a full-blown tuple so if you really want to descend to bare roots, then that's how you'd need to define it, and now "set of attribute names" must also become a first-class citizen (or at least very very close) of your language.  Also, there's the question here of "lax" vs. "non-lax" (or both).

INTERSECT could be thought of in two flavours : one operating on two tuples, and returning only the common (attrname attrvalue) pairs, another one operating on one tuple and one set of attribute names (thus giving "retain only these attrs" semantics).  And you once again have the "lax" and "non-lax" thing here.

UNION (your TIMES) should, in addition to just doing the set union of the pairs, always check that attribute names are still unique and identifying in the result tuple.

ADDIT

INTERSECT might not be strictly needed, I just realized it should remain equivalent to ((A UNION B) MINUS (A MINUS B)) MINUS (B MINUS A), but you'd need to figure out the "lax vs non-lax" question for each of those four MINUSes, and you might want that "exception" raised by UNION to not occur here, e.g. when taking the intersection of the unary tuples (NAME 2) and (NAME 5).  And you probably want this INTERSECT to exist for the ability to tack its specific optimal evaluation strategy onto the construct, rather than having to recognize the longhand for the shorthand.