The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Operators on tuples/'Expressively complete'/RM Pre 6

Page 1 of 5Next

(I feel sure there's already a thread on this subject/I can't find it. No matter.)

(I'm hoping saying 'expressively complete' is sufficiently far away from any existing term of art (especially from 'Relationally complete'); but also suggestive enough you'll get what I mean.)

RM Pre 6 re TUPLE types

The applicable operators shall include operators analogous to the RENAME, project, EXTEND, and JOIN operators of the relational algebra (see RM Prescription 18), ...

I've been working on a 'minimal' set of operators over tuples, taking a generous interpretation of RM Pre 6's "analogous to". Specifically, I wanted to avoid JOIN because it can fail at run-time (same-named attributes having differing values); but provide operators with which the programmer can mix-and-match their own, maybe:

  • detect if two TUPLEs are joinable;
  • throw an exception if not; or
  • return a value with different wrappers depending.

I'm assuming the D-like already supports syntax to create Tuples from attribute names and expressions: TUPLE{ PNO "P123", SNO "S456", QTY  f(x + y) } .

To define TupleJoinable for this purpose, it's useful to define TupleProject as using a tuple to specify the attribute names on which to project:

  • t1 TupleJoinable t2 =df (t1 TupleProject t2) == (t2 TupleProject t1);
  • t1 TupleProject t2 =df <a tuple with those attributes from t1 whose names also appear in t2>;

Similarly TupleRemove using a tuple to specify the attribute names to remove:

  • t2 TupleRemove t1 =df <a tuple with those attributes from t2 whose names don't appear in t1>;

Note there's no requirement with TupleProject, TupleRemove that the r.h. operand's attribute names be a subset of the l.h. operand; nor that corresponding attribute names be at the same attribute type. In case of type mis-match, the attempted equality test will get rejected at compile time. OTOH for 'gluing together' tuples, there must be no attribute names in common. What to call that operation? Append or Concat too much suggests a sequencing of attributes. It's not TupleUnion, because of the requirement for attribute disjointedness. I'll use TupleExtend -- which'll handily fulfill another requirement in RM Pre 6.

  • t1 TupleJoin t2 =df if (t1 TupleJoinable t2) then (t1 TupleExtend (t2 TupleRemove t1)) else (error handling);

TupleExtend is to be commutative.

Then TupleRename is built from <tuple> TupleProject TUPLE{ <attrName> <don'tcarevalue> } to get THE_<attrName> value at the as-was attribute; TupleRemove to snip off that attribute (again with a `don'tcarevalue' in a singleton Tuple literal); TupleExtend to glue on the to-be attribute (as a singleton Tuple literal).

So I seem to have satisfied RM Pre 6 with merely three primitive operators (plus syntax for Tuple literals, plus THE_<attrName>).

Or have I?

This post is by way of asking for test cases over which I can run my implementation -- in Haskell/Trex yay!

[Somehow I knew who would be waking up the list again.  Long time no see/hear/read/etc.]

I'm not sure I'm with you on the meaning or intention of your proposed operators. I would have thought there should be:

  • TUPCON to create a constant value (the result of an out of scope computation)
  • REMOVE, to remove an attribute
  • EXTEND, to add an attribute (kind of JOIN-ish,  like EXTEND).

Seems to be 'expressively complete'. Everything else therefore must be expressed in terms of those three AFAICT.

Maybe the question should be: what would be useful in practice? Answer: None of the above?

Andl - A New Database Language - andl.org

I suspect that the TUPLE operator you had in mind when you wrote the following

  • detect if two TUPLEs are joinable;
  • throw an exception if not; or
  • return a value with different wrappers depending.

is just the tuple UNION operator :

  • returns the set union of the <attribute name, attribute value> pairs of the tuples acting as argument to the UNION
  • throws an exception if that set union is not a TTM tuple (i.e. if the returned set would not satisfy the rule of having unique attribute names in its result)
  • I don't know what to make of your third bullet.

And for those who want to nitpick on this answer : there's always a choice to be made between losing the rigour and losing the audience.

Maybe the question should be: what would be useful in practice? Answer: None of the above?

TUPLE UNION as I described is most definitely an operator that can at least reasonably be argued to be ***BUILT UPON*** by relational JOIN.

***HOW*** are you going to build the result of any relational JOIN in your relational DBMS if you ***DON'T*** have TUPLE UNION to build on ?

I guess it's all a matter of perspective ...

Quote from dandl on September 27, 2022, 1:42 pm

I'm not sure I'm with you on the meaning or intention of your proposed operators. I would have thought there should be:

  • TUPCON to create a constant value (the result of an out of scope computation)

Yes, provided by:

> syntax to create Tuples from attribute names and expressions: TUPLE{ PNO "P123", SNO "S456", QTY  f(x + y) } .

The attribute values "expressions" can be constants.

  • REMOVE, to remove an attribute
  • EXTEND, to add an attribute (kind of JOIN-ish,  like EXTEND).

Yes, I have TupleRemove, TupleExtend. They take both operands as tuples, so that you can remove/extend multiple attributes in one operation. (But of course that might be a single attribute at a time; and that single attribute might be expressed as a constant TUPLE{ SNO don'tcarevalue }.)

Seems to be 'expressively complete'.

No: compare Relational operators like JOIN: their operands are considered by their whole Heading; they return a result with a whole Heading. That's not defined in terms of breaking down to attribute-at-a-time. How would you express a polymorphic operation like JOIN/TupleJoin attribute-at-a-time when it doesn't know 'in advance' what attributes might be in its operands? You need to distinguish the attribute appearing in both operands vs only in one.

Everything else therefore must be expressed in terms of those three AFAICT.

Maybe the question should be: what would be useful in practice? Answer: None of the above?

By saying 'Expressively complete' I'm trying to avoid skewing the field as to "useful in practice". No the ones I've given aren't sufficient to be useful. But for example TupleJoinable is useful and can be defined in terms of the primitives.

A 'fully featured' D would include more tuple operators -- just as Tutorial D includes [NOT] MATCHING.

Quote from Erwin on September 27, 2022, 8:23 pm

I suspect that the TUPLE operator you had in mind when you wrote the following

  • detect if two TUPLEs are joinable;
  • throw an exception if not; or
  • return a value with different wrappers depending.

is just the tuple UNION operator :

I considered that option. No it doesn't meet what I "had in mind". I hope you can see you could build your UNION from the primitives I give. Here's why I think UNION should not be in the primitives:

  • returns the set union of the <attribute name, attribute value> pairs of the tuples acting as argument to the UNION

No, it needs some way to check first if the operands are TupleJoinable.

  • throws an exception if that set union is not a TTM tuple (i.e. if the returned set would not satisfy the rule of having unique attribute names in its result)

No, some programmers and programming languages are allergic to throwing exceptions -- especially since you might be using tuple joining to implement relational JOIN (as your follow-on message suggests) then: in pairwise combining the tuples from JOIN's operands it's to be expected (not an exception) for some pairs to fail to be joinable; the implementation of JOIN should be able to test for joinability rather than handle it through throw/catch.

  • I don't know what to make of your third bullet.

Some programmers/programming languages prefer to always return a well-typed result from a well-typed expression. So they might prefer to return a Relation from an attempted TupleJoin: a singleton if TupleJoinable, otherwise empty. That's one way to provide "different wrappers depending". I don't want to impose any particular approach (allergic though I am to throwing exceptions) so I'm giving a minimal 'expressively complete' toolkit.

And for those who want to nitpick on this answer : there's always a choice to be made between losing the rigour and losing the audience.

There'd be more operators built from my toolkit, to fit different audiences. Whatever your audience, I hope you're not really suggesting an operator should build a pseudo-tuple that is not a TUPLE -- because it has same-named attributes with differing values. IOW, although a TTM TUPLE is certainly a set of attribute name-value pairs; not every set of such pairs is a TUPLE; using a set operator like UNION is misleading: it would lose me as an audience.

***HOW*** are you going to build the result of any relational JOIN in your relational DBMS if you ***DON'T*** have TUPLE UNION to build on ? [From your follow-up.]

I have TupleJoinable; and that's defined in terms of primitives/the toolkit that I'm claiming is 'expressively complete'. I take the tuples pairwise from each operand to the JOIN; I test TupleJoinable: if so, emit the (t1 TupleExtend (t2 TupleRemove t1)) (which uses only primitives); else emit nothing; go on to the next pairing.

There is a mathematical reality, which is that the "tuple JOIN" you have in mind (and I hope I'm reading your mind correctly) ***IS*** a set union of <name, value> pairs.  There is also the mathematical reality that not all such set unions satisfy the "FD" {name}->{value}.  So there is this "exception" case to be considered and to be defined what "tuple JOIN" will do when presented such a case.

But if you let "some programmers' allergies" prevail over mathematical reality, then I really don't think there's any fruitful discussion to be had.

(BTW there's also a design discipline of "choosing the proper name".  Date revived an old paper of his on this very subject in one of his latest ("Stating the Obvious", chpt 3).  "joinable" seems to be about asking a Y/N question and therefore the return type of such an operator had better be just BOOLEAN.)

(And BTW2 your TupleRemove operator seems to suffer from a somewhat similar design flaw in that the user is required to give an entire tuple as the second argument when all that is effectively used is just the set of attribute names therein (this requires you to specify what will happen if t2 contains an attribute with the same name as some attribute in t1, but with a value of a totally different type).  Why does relational projection not require the user to provide a full relation as the argument providing the "projection spec" ???  Because it's not necessary.  If all that is needed to make the operator work is the set of attribute names, then require nothing more than the set of attribute names.  Seems like a sound principle to me.  If it causes problems in currently existing languages, the problem is not mine, it's the currently existing language's.)

And especially regarding this one in particular :

"the implementation of JOIN should be able to test for joinability rather than handle it through throw/catch."

Please provide convincing proof of this dogma.

Quote from Erwin on September 28, 2022, 6:26 pm

There is a mathematical reality, ...

... to the effect that one set of operators is equivalent to another just in case you can express all the same operations using combinations of them.

With the set of operations I give, I can combine to produce a tuple that is the union of its operands -- each considered as a set of attribute name-value pairs -- and providing same-named attributes are at same value.

There is no "mathematical reality" that I must have a primitive spelled UNION.

 "joinable" seems to be about asking a Y/N question and therefore the return type of such an operator had better be just BOOLEAN

Yes it is just BOOLEAN. My o.p. illustrates a call to it as the condition in an IF. If you'd prefer to throw an exception, your ELSE branch can do that.

 TupleRemove operator seems to suffer from a somewhat similar design flaw in that the user is required to give an entire tuple as the second argument when all that is effectively used is just the set of attribute names therein

I illustrate the usefulness of taking tuples to denote a set of attribute names, exactly with that TupleJoinable -- which is not a primitive:

  • t1 TupleJoinable t2 =df (t1 TupleProject t2) == (t2 TupleProject t1);

Without that usefulness, I'd have to express those TupleProjects using what? to marshall each tuple into an attribute name list.

Note neither Tutorial D nor relational theory has a 'first class' construct "set of attribute names" -- that is, you can't assign such a set to a variable; pass it as an argument to a function; extract a set from a relation's or tuple's Heading; etc. In Tutorial D, the r.h. syntactic construct to the project or REMOVE operations can only be a literal.

dogma

That comment is applied to my saying "should be able to ...". I didn't say 'must only ...'.

I put up the question:

a) because the forum is getting moribund;

b) because I wasn't sure if this set was expressively complete.

You've illustrated why a). You've not helped wrt b).

If you aesthetically don't like my set, that's fine. Can you see that what I give is a toolkit to construct a set you'd prefer? Note you'd need to invent a new entity "set of attribute names".

Assume a dataset in the form of a 'tuple soup', each tuple consisting of:

  • a set of names (the 'heading')
  • Values corresponding to each of those names
  • A name and value together constitute an attribute. Each value belongs to a type, so each name also carries the type of that value.

This is a perfectly plausible data structure. Perhaps many of those tuples contain 'key' values, and the tuples are disparate facts known about each key.

The plausible operations on this 'tuple soup' will always return a new tuple value. The primitives are:

  1. T() is the null tuple
  2. T.extend(name, value) adds an attribute to a tuple if it does not already exist.
  3. T.remove(name) removes an attribute from a tuple if it exists
  4. T1 + T2 merges two tuples, or fails
  5. TI - T2 removes attributes (names) in one found in another, or fails

AFAICT it is straightforward to write queries that can transform the tuple soup in various ways, including  adding new tuples (facts) and filling them out with default values, in order to extract relations from the soup.

Now why do you want something called JOIN? In fact, why do you need more than this?

Andl - A New Database Language - andl.org
Page 1 of 5Next