The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

Support for JOIN alternatives

Hello!

I have a question about the RM Proscriptions in The 3rd Manifesto for practical purposes:

Is cohesive information output from a D-compatible database for practical purposes considered respectable and interesting, too? If not then why?

Sorry if I am missing something basic here, but I can't find the D "equivalent" of the SQL outer JOIN. I feel, however, that it is the responsibility of a relational database to keep all information about the entities in it, the relations between them as well as the entities missing from these relations accessible at its interface within reasonable effort. In other words, I see no point in decoupling cohesive information as it is read from a relational database, and then well, outside the relational database finding out the missing relations for e.g. displaying purposes in table form. A more versatile set of output operations would be helpful that improves atomicity and need not interfere too much with the inner algebra of the system.

Is Rel one of the D-compatible database management systems? I haven't found an elegant (declarative) way of getting the entities missing from the relations in the result set out of it yet, so any help in this regard would be appreciated, too.

Thanks for any insight in the underlying considerations and information about planned solutions!

Andres

Example in Tutorial D. Let the aforementioned database contain information about a few persons associated with their phones in a full variety of ways. By means of a non-declarative hack, a relation that also contains information about the missing relations can be constructed even if not every person has a phone which violates the following RM proscription:

4. D shall include no concept of a “relation” in which some “tuple” includes some “attribute” that does not have a value.

I can see the point, but why not separate concerns and provide efficient means for manipulating output? As SQL has a LEFT JOIN operator that can be used for this purpose and NULL is not the most suitable marker of missing data, a GROUP-clause (not in the SQL sense) for JOIN could be introduced in D for e.g. outputting purposes in table form. Assuming that I am not missing a better solution already available, the code below is obviously not a viable solution for practical purposes.

Does Tutorial D support a more compact alternative to the SQL "X left join Y" clause where in the relational domain missing data from Y joined with tuples of X would be marked by (alternatively put: grouped as) an empty new relation Z, and tuples from Y in the result would be similarly grouped as relation Z? If not then what is the Tutorial D approach to retrieving e.g. a list of all persons with phones attached to those associated with at least one, and handing the output over to the UI? How can the following code be simplified?

VAR person REAL RELATION {name CHARACTER} KEY {name};
VAR phone REAL RELATION {number INTEGER, owner CHARACTER} KEY {number};
CONSTRAINT phone_fkey_person phone { owner } RENAME { owner AS name } <= person { name };
person := RELATION {
TUPLE {name 'Jack'},
TUPLE {name 'Jill'},
TUPLE {name 'Jane'}
};
phone := RELATION {
TUPLE {owner 'Jack', number 123},
TUPLE {owner 'Jane', number 456},
TUPLE {owner 'Jane', number 789}
};
VAR tmp REAL RELATION {name CHARACTER} KEY {name};
tmp := person RENAME {name AS j} SEMIJOIN ( phone RENAME {owner AS j} ) RENAME {j AS name};
tmp := person MINUS tmp;
VAR leftjoin REAL RELATION {name CHARACTER, phones RELATION {number INTEGER}} KEY {name};
leftjoin := EXTEND tmp : {phones := RELATION {number INTEGER} {}};
leftjoin := leftjoin UNION ( person RENAME {name AS j} JOIN ( phone RENAME {owner AS j} ) GROUP {number} AS phones RENAME {j AS name} );

Quote from asulling on November 24, 2018, 11:54 am

Is cohesive information output from a D-compatible database for practical purposes considered respectable and interesting, too? If not then why?

It's not clear to me what you're asking.

Quote from asulling on November 24, 2018, 11:54 am

Sorry if I am missing something basic here, but I can't find the D "equivalent" of the SQL outer JOIN.

There isn't one, yet. Which of several possible solutions is preferred is a matter of perhaps some controversy, though I would argue that a solution proposed by Erwin -- if I recall correctly -- seems the most palatable. After a brief and not particularly thorough search of the archives (I am lazy), I couldn't find the specific post(s) in the archives, so from memory it could be something like...

p LEFTJOIN q

...which would return a TUPLE {matched SAME_TYPE_AS(p JOIN q), missing SAME_TYPE_AS(q)} per obvious semantics.

Quote from asulling on November 24, 2018, 11:54 am

Is Rel one of the D-compatible database management systems?

It is an implementation of Tutorial D, so by definition it should be.

I haven't implemented the above LEFTJOIN in Rel, but looking at it now, I think it's worthwhile including it as a Rel-specific extension. I have added it to my "to do" list.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from asulling on November 24, 2018, 11:54 am

Does Tutorial D support a more compact alternative to the SQL "X left join Y" clause where in the relational domain missing data from Y joined with tuples of X would be marked by (alternatively put: grouped as) an empty new relation Z, and tuples from Y in the result would be similarly grouped as relation Z? If not then what is the Tutorial D approach to retrieving e.g. a list of all persons with phones attached to those associated with at least one, and handing the output over to the UI? How can the following code be simplified?

Using your example relvars, if I need each person's phone number, I'd do this:

person JOIN (phone RENAME {owner AS name})

I wouldn't want to include people without phone numbers in a list of each person's phone numbers, because a not-has-a-phone-number isn't a phone number.

To find out who doesn't have a phone number, do this:

person NOT MATCHING (phone RENAME {owner AS name})

If I had to combine them, I'd do this:

WITH (phoneRenamed := phone RENAME {owner AS name}):
   UNION {
      (EXTEND person JOIN phoneRenamed: {numberOrNot := CAST_AS_CHAR(number)}) {ALL BUT number},
      EXTEND person NOT MATCHING phoneRenamed: {numberOrNot := ''}
   }

But hopefully I wouldn't have to do that.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Oh dear.  How to handle missing information without nulls, again.

The relational problem with SQL outer join is that it [potentially] generates NULL.  Various ways have been researched and proposed to tackle that problem but none of them are likely to be in the league of solutions that you consider as being "within reason".

Anyway, the reason for my reply here is to clarify some stuff that Dave said kind of in my name.

One, SQL has two kinds of "outer join", a "one-way" and a "two-way" (LEFT/RIGHT and FULL).  I'll distinguish between them as appropriate.

SIRA_PRISE (the implementation by my hand if you weren't aware) has a LEFTJOIN operator and to avoid being a source of NULL or any such thing, it requires the user to specify the values to be included in the result in those cases where the SQL left join would have produced NULL.  So SIRA_PRISE's LEFTJOIN is the semantic equivalent of a UNION of :

  • the natural join of the two relations, which will cater for the "matching" tuples to be present
  • an EXTEND of the SEMIMINUS between the two relations, which will cater for the "nonmatching" tuples to be present, with "default" values for the "missing" attributes as should be obvious.

A similar solution could be envisaged for someting equivalent to SQL FULL outer join, but that has not been done in SIRA_PRISE.  And a full-blown D should make it possible for the user to define such an operator as a shorthand.

At any rate, the trouble with this approach is still that once you have the result, there is not guaranteed to be an easy way to distinguish the tuples which came from the natural join (the matching tuples) from the others (the nonmatching ones).  A "default" date of 9999-12-31 could still just as well actually appear in the database in a matching tuple.  Ditto for the zero-length string supposed to represent a phone number.

Also, pls note ***very carefully*** that the equivalent longhand here is a UNION, meaning the predicate of the result is ***disjunctive***, meaning the user is probably seriously deluding himself if he thinks that "looking at both matching and nonmatching tuples in one single table" has somehow made them "mean the same thing".

Another potentially useful idea is the following :

  • Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.
  • The first relation-valued attribute is the natural join (the matching tuples)
  • The second relation-valued attribute is the SEMIMINUS between the first and the second join operand
  • And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

With apologies for distinguishing these three attributes by ordinal position, and not mentioning explicitly that the TTM spirit will require you to NAME these three attributes properly.  (I am lazy too.)

No system I know has gone down that alley, but still a full-blown D should make it possible for the user to *DEFINE* his "outer" joins this way.

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

  • Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.
  • The first relation-valued attribute is the natural join (the matching tuples)
  • The second relation-valued attribute is the SEMIMINUS between the first and the second join operand
  • And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

However, a general solution employing essentially the same notion could use tagged union types instead of sentinel values. E.g., a phone attribute of type INT in the right-hand operand to LEFTJOIN per the examples above might be returned from LEFTJOIN belonging to a type phone_ defined as:

TYPE phone_ UNION;
TYPE phone_missing IS {phone_ POSSREP {}};
TYPE phone_int IS {phone_ POSSREP {phone INT}};

Of course, in a sense it's still a sentinel value, but only in a type that encompasses the full domain of phone numbers without infringing on them.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Just briefly to say that I too found the OP incoherent to the point of wondering what if anything to say in reply. Is this

  • yet another foray into why TTM should support NULL (see lengthy quote from DTATRM in answer to that)
  • a more reasoned enquiry into how to perform certain kinds of JOIN without either generating NULLs or losing tuples.

Assuming the latter, the above replies deal with this perfectly adequately. I would just point out that the questions of 'efficiency' and 'compact' are beside the point. It is beyond question that the RA in any D (such as Rel) can indeed express these queries, and that RM VSS 6 deals adequately with making them convenient.

6. D should provide some means for users to define their own generic operators, including in particular generic relational operators.

Andl - A New Database Language - andl.org

Thanks for updating the Rel development "to do" list and making things clearer to me! As I mentioned in the Rel forum, I am relatively new to Rel, Tutorial D and the underlying theory. I am considering to use the Rel DBMS in a project, trying it out. And I am extremely slow at responding, unfortunately.

By "cohesive information output" (sorry about lacking accuracy) I meant that e.g. not-has-a-phone-number might be useful information outside the database although not a phone number, and it sticks with the person in focus. And a Rel-specific extension or similar is what I meant by "a more versatile set of output operations". I hope that my command of English is good enough for this forum.

 

I wouldn't want to include people without phone numbers in a list of each person's phone numbers, because a not-has-a-phone-number isn't a phone number.

There can be more than one reason for the guy I am trying to contact not showing up in the contacts listing using the plain natural JOIN, assuming that the UI designer has done a good job in minimizing the number of clicks needed to reach the target by displaying phone numbers right next to the persons who have one, and I don't have to make an extra click to open all his contacts with e-mail, homepage and other irrelevant data:

- he doesn't have a phone;
- the guy hasn't even registered yet;
- ...

Depending on the UI implementation, the first scenario is a problem: assuming the above mentioned UI design, this guy's other contacts can't even be reached from this view. The second scenario can be a problem, too, if it can't be distinguished from the first one: sorry that I can't come up with a better example at the moment, but maybe he got stuck in a buggy UI, is too busy to find his way out, and I as developer should find out whether he needs troubleshooting? And based on his feedback improve the UI, of course.

A more obvious example would be a queue of something displayed in a column of a table. We have a tennis court on campus, so practice mate candidates could show up in such a column displaying requests for play for the bookings missing a player. I have a regular practice time on Saturdays when people are in the countryside, so if I used this table to find tennis partners then this column would often be empty at my row. A guy working on weekends and willing to play, however, would spot the empty rows first to avoid ending up in the waiting list as the page opens in his browser, so thanks to a LEFT JOIN output operator (I'll describe below what I mean by that) I would still have a good chance for a weekend tennis practice, had I not found an opponent at my level of play yet.

I would strongly prefer keeping the "matched" and "missing" parts separate as Erwin put it whenever the UNION of these lists need not be sorted which is seldom the case, unfortunately... Hence I am afraid that all of this clutter has a say for practical purposes, but at least not in D, of course, just a little Rel extension to support making the UI user-friendly.

 

Quote from Erwin on November 25, 2018, 6:33 pm

The relational problem with SQL outer join is that it [potentially] generates NULL.  Various ways have been researched and proposed to tackle that problem but none of them are likely to be in the league of solutions that you consider as being "within reason".

Let me clarify my point: I reject the possibility of any NULLs at any inputs in relational algebra. However, I feel that an additional layer for output is needed at database interface which helps keep the UI user-friendly while not interfering with the inner workings of the DBMS too much, i.e. the result of a LEFTJOIN and suchlike in this interface layer would always go right out of the database for non-algebraic purposes. When a view is created or updated in the UI layer, my concern is string output instead of the type safety of the database layer, so being able to sort the UNIONs of the matched and missing parts of LEFT/OUTER JOINs and define default values for the result sets would be enough if I am not missing something important here.

 

Quote from Erwin on November 25, 2018, 6:33 pm

Also, pls note ***very carefully*** that the equivalent longhand here is a UNION, meaning the predicate of the result is ***disjunctive***, meaning the user is probably seriously deluding himself if he thinks that "looking at both matching and nonmatching tuples in one single table" has somehow made them "mean the same thing".

Thanks, maybe an example would be helpful here, too. The way I see it, things in the UI have a different meaning for the user than what is important in the database. I as player need to find another registered user in a sorted listing with or without a phone number, and the rows without a phone number just look a little more empty, that's all. It seems to be an overkill (please correct me if I am wrong) to let ORM for type safety join the matching and non-matching tuples, do the sorting of data output from the database and create a list of objects. Instead I would simply provide a zero-length string as default, out of the database comes a sorted list of tuples with potentially missing attributes cast to strings or defaults, this will be sent to a view and I can't see danger lurking at the moment. Nor can I see a good reason for the absence of constraints allowing zero-length strings to represent phone numbers in a database. I would rather make use of such constraints in a well-designed database to distinguish the zero-length string default value that I provide for displaying purposes from what is retrieved from the database.

I hope that I succeeded in clarifying what I would need if using Rel in my project. Maybe something else could add to this. I can elaborate further next weekend if necessary.

 

Quote from asulling on November 26, 2018, 6:25 am

Thanks for updating the Rel development "to do" list and making things clearer to me! As I mentioned in the Rel forum, I am relatively new to Rel, Tutorial D and the underlying theory. I am considering to use the Rel DBMS in a project, trying it out. And I am extremely slow at responding, unfortunately.

By "cohesive information output" (sorry about lacking accuracy) I meant that e.g. not-has-a-phone-number might be useful information outside the database although not a phone number, and it sticks with the person in focus. And a Rel-specific extension or similar is what I meant by "a more versatile set of output operations". I hope that my command of English is good enough for this forum.

I wouldn't want to include people without phone numbers in a list of each person's phone numbers, because a not-has-a-phone-number isn't a phone number.

There can be more than one reason for the guy I am trying to contact not showing up in the contacts listing using the plain natural JOIN, assuming that the UI designer has done a good job in minimizing the number of clicks needed to reach the target by displaying phone numbers right next to the persons who have one, and I don't have to make an extra click to open all his contacts with e-mail, homepage and other irrelevant data:

- he doesn't have a phone;
- the guy hasn't even registered yet;
- ...

Depending on the UI implementation, the first scenario is a problem: assuming the above mentioned UI design, this guy's other contacts can't even be reached from this view. The second scenario can be a problem, too, if it can't be distinguished from the first one: sorry that I can't come up with a better example at the moment, but maybe he got stuck in a buggy UI, is too busy to find his way out, and I as developer should find out whether he needs troubleshooting? And based on his feedback improve the UI, of course.

A more obvious example would be a queue of something displayed in a column of a table. We have a tennis court on campus, so practice mate candidates could show up in such a column displaying requests for play for the bookings missing a player. I have a regular practice time on Saturdays when people are in the countryside, so if I used this table to find tennis partners then this column would often be empty at my row. A guy working on weekends and willing to play, however, would spot the empty rows first to avoid ending up in the waiting list as the page opens in his browser, so thanks to a LEFT JOIN output operator (I'll describe below what I mean by that) I would still have a good chance for a weekend tennis practice, had I not found an opponent at my level of play yet.

I would strongly prefer keeping the "matched" and "missing" parts separate as Erwin put it whenever the UNION of these lists need not be sorted which is seldom the case, unfortunately... Hence I am afraid that all of this clutter has a say for practical purposes, but at least not in D, of course, just a little Rel extension to support making the UI user-friendly.

These sound like UI issues rather than data management / database issues.

Though on the database side, it is fairly easy to turn phone numbers and not-has-a-phone-numbers into a single relation of missing-or-present-phone-number-for-display values, like the query at the end of post #3. But, as befits the data management rigour of a D (and unlike SQL) it's not going to allow a trivial conflation of phone numbers and not-has-a-phone-numbers.

Note that Rel is fundamentally a database management system. Creating end-user interfaces is, for the moment, outside its scope.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on November 25, 2018, 8:21 pm
Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

  • Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.
  • The first relation-valued attribute is the natural join (the matching tuples)
  • The second relation-valued attribute is the SEMIMINUS between the first and the second join operand
  • And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

 

Yes I noticed that too late.  You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards :-)

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

As for the "sentinel values", the first choice is whether to retain the original type, say PHONE_NO, implying you're forced to pretend that people with no known phone n° do have one, and moreover one that's at least technically valid.

An alternative is to not retain the original type, and use some other one, e.g. tagged unions (essentially "my" M() stuff), ZOO relations, or just type String.  The latter having the disadvantage that for the matching ones, you lose the ability to operate directly on the values because they no longer are values of type PHONE_NO.  In my experience (not that it's that extended but anyway) the former two have the disadvantage that setting up the base relations (i.e. transforming them appropriately using GROUP or whatever) can still be very tedious even with a powerful set of operators.  And it's very hard to mention everything relevant that has ever been said on the subject and yet remain succinct enough for things to fit in a readable post here.