Support for JOIN alternatives

#11 · November 26, 2018, 2:48 pm

Quote from Erwin on November 26, 2018, 2:48 pm

Quote from Dave Voorhis on November 26, 2018, 1:08 pm

Quote from asulling on November 26, 2018, 6:25 am

Thanks for updating the Rel development "to do" list and making things clearer to me! As I mentioned in the Rel forum, I am relatively new to Rel, Tutorial D and the underlying theory. I am considering to use the Rel DBMS in a project, trying it out. And I am extremely slow at responding, unfortunately.

By "cohesive information output" (sorry about lacking accuracy) I meant that e.g. not-has-a-phone-number might be useful information outside the database although not a phone number, and it sticks with the person in focus. And a Rel-specific extension or similar is what I meant by "a more versatile set of output operations". I hope that my command of English is good enough for this forum.

I wouldn't want to include people without phone numbers in a list of each person's phone numbers, because a not-has-a-phone-number isn't a phone number.

There can be more than one reason for the guy I am trying to contact not showing up in the contacts listing using the plain natural JOIN, assuming that the UI designer has done a good job in minimizing the number of clicks needed to reach the target by displaying phone numbers right next to the persons who have one, and I don't have to make an extra click to open all his contacts with e-mail, homepage and other irrelevant data:

- he doesn't have a phone;
- the guy hasn't even registered yet;
- ...

Depending on the UI implementation, the first scenario is a problem: assuming the above mentioned UI design, this guy's other contacts can't even be reached from this view. The second scenario can be a problem, too, if it can't be distinguished from the first one: sorry that I can't come up with a better example at the moment, but maybe he got stuck in a buggy UI, is too busy to find his way out, and I as developer should find out whether he needs troubleshooting? And based on his feedback improve the UI, of course.

A more obvious example would be a queue of something displayed in a column of a table. We have a tennis court on campus, so practice mate candidates could show up in such a column displaying requests for play for the bookings missing a player. I have a regular practice time on Saturdays when people are in the countryside, so if I used this table to find tennis partners then this column would often be empty at my row. A guy working on weekends and willing to play, however, would spot the empty rows first to avoid ending up in the waiting list as the page opens in his browser, so thanks to a LEFT JOIN output operator (I'll describe below what I mean by that) I would still have a good chance for a weekend tennis practice, had I not found an opponent at my level of play yet.

I would strongly prefer keeping the "matched" and "missing" parts separate as Erwin put it whenever the UNION of these lists need not be sorted which is seldom the case, unfortunately... Hence I am afraid that all of this clutter has a say for practical purposes, but at least not in D, of course, just a little Rel extension to support making the UI user-friendly.

These sound like UI issues rather than data management / database issues.

The trouble is that if the UI is highly CRUD-ish and lists displays are highly "tabular", then the distinction between "UI" and "DB" is a very very thin one ...

Quote from Dave Voorhis on November 26, 2018, 1:08 pm

Quote from asulling on November 26, 2018, 6:25 am

Thanks for updating the Rel development "to do" list and making things clearer to me! As I mentioned in the Rel forum, I am relatively new to Rel, Tutorial D and the underlying theory. I am considering to use the Rel DBMS in a project, trying it out. And I am extremely slow at responding, unfortunately.

By "cohesive information output" (sorry about lacking accuracy) I meant that e.g. not-has-a-phone-number might be useful information outside the database although not a phone number, and it sticks with the person in focus. And a Rel-specific extension or similar is what I meant by "a more versatile set of output operations". I hope that my command of English is good enough for this forum.

I wouldn't want to include people without phone numbers in a list of each person's phone numbers, because a not-has-a-phone-number isn't a phone number.

There can be more than one reason for the guy I am trying to contact not showing up in the contacts listing using the plain natural JOIN, assuming that the UI designer has done a good job in minimizing the number of clicks needed to reach the target by displaying phone numbers right next to the persons who have one, and I don't have to make an extra click to open all his contacts with e-mail, homepage and other irrelevant data:

- he doesn't have a phone;
- the guy hasn't even registered yet;
- ...

Depending on the UI implementation, the first scenario is a problem: assuming the above mentioned UI design, this guy's other contacts can't even be reached from this view. The second scenario can be a problem, too, if it can't be distinguished from the first one: sorry that I can't come up with a better example at the moment, but maybe he got stuck in a buggy UI, is too busy to find his way out, and I as developer should find out whether he needs troubleshooting? And based on his feedback improve the UI, of course.

A more obvious example would be a queue of something displayed in a column of a table. We have a tennis court on campus, so practice mate candidates could show up in such a column displaying requests for play for the bookings missing a player. I have a regular practice time on Saturdays when people are in the countryside, so if I used this table to find tennis partners then this column would often be empty at my row. A guy working on weekends and willing to play, however, would spot the empty rows first to avoid ending up in the waiting list as the page opens in his browser, so thanks to a LEFT JOIN output operator (I'll describe below what I mean by that) I would still have a good chance for a weekend tennis practice, had I not found an opponent at my level of play yet.

I would strongly prefer keeping the "matched" and "missing" parts separate as Erwin put it whenever the UNION of these lists need not be sorted which is seldom the case, unfortunately... Hence I am afraid that all of this clutter has a say for practical purposes, but at least not in D, of course, just a little Rel extension to support making the UI user-friendly.

These sound like UI issues rather than data management / database issues.

The trouble is that if the UI is highly CRUD-ish and lists displays are highly "tabular", then the distinction between "UI" and "DB" is a very very thin one ...

Author of SIRA_PRISE

#12 · November 26, 2018, 3:44 pm

Quote from Erwin on November 26, 2018, 2:48 pm

Quote from Dave Voorhis on November 26, 2018, 1:08 pm

These sound like UI issues rather than data management / database issues.

The trouble is that if the UI is highly CRUD-ish and lists displays are highly "tabular", then the distinction between "UI" and "DB" is a very very thin one ...

From a user's point of view, definitely. From a developer's point of view, partly. From a theoretical point of view, they couldn't be further apart.

Industrial applications may need to blur lines here, but I'm not sure Rel needs to.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#13 · November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#14 · November 26, 2018, 11:38 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

This is one of many places where there is much to be said for defining a type that specifies a 'distinguished value'. The DV is a valid member of the type, has a literal representation and is valid for comparisons, but may have special behaviour if used as argument to an operator. A well known example is NaN. A DATE type is one that often needs to have a DV.

But that is not to say that the 'outer join' operator should treat it specially -- it simply provides a convenient default value to represent 'unknown'.

Andl - A New Database Language - andl.org

#15 · November 27, 2018, 7:38 am

Quote from Dave Voorhis on November 27, 2018, 7:38 am

Quote from dandl on November 26, 2018, 11:38 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

This is one of many places where there is much to be said for defining a type that specifies a 'distinguished value'. The DV is a valid member of the type, has a literal representation and is valid for comparisons, but may have special behaviour if used as argument to an operator. A well known example is NaN. A DATE type is one that often needs to have a DV.

But that is not to say that the 'outer join' operator should treat it specially -- it simply provides a convenient default value to represent 'unknown'.

A sentinel or distinguished value is only acceptable when you can guarantee that every use of the given type will be restricted to a subset of possible values of the type, such that every distinguished value lies outside that subset.

It's also acceptable when the type inherently contains distinguished values such as NaN or the empty string. Though, even NaN is questionable, because a given NaN won't distinguish an intentional data-entry sentinel from a calculation result, and for every sentinel Date value you choose, there will be some application or user that considers it to be a valid date or a valid "date".

The first approach -- sentinel or distinguished values chosen from outside a subset of possible type values -- can only ever be application-specific. It cannot be generalised, because for every sentinel or distinguished value you choose, there will be some application -- or some users of an application -- that will consider it to be a valid for some purpose.

A better general approach is to define sentinel or distinguished values as distinct components of a tagged-union type rather than choosing them -- typically rather arbitrarily -- from an otherwise undistinguished non- tagged-union type.

Quote from dandl on November 26, 2018, 11:38 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

This is one of many places where there is much to be said for defining a type that specifies a 'distinguished value'. The DV is a valid member of the type, has a literal representation and is valid for comparisons, but may have special behaviour if used as argument to an operator. A well known example is NaN. A DATE type is one that often needs to have a DV.

But that is not to say that the 'outer join' operator should treat it specially -- it simply provides a convenient default value to represent 'unknown'.

A sentinel or distinguished value is only acceptable when you can guarantee that every use of the given type will be restricted to a subset of possible values of the type, such that every distinguished value lies outside that subset.

It's also acceptable when the type inherently contains distinguished values such as NaN or the empty string. Though, even NaN is questionable, because a given NaN won't distinguish an intentional data-entry sentinel from a calculation result, and for every sentinel Date value you choose, there will be some application or user that considers it to be a valid date or a valid "date".

The first approach -- sentinel or distinguished values chosen from outside a subset of possible type values -- can only ever be application-specific. It cannot be generalised, because for every sentinel or distinguished value you choose, there will be some application -- or some users of an application -- that will consider it to be a valid for some purpose.

A better general approach is to define sentinel or distinguished values as distinct components of a tagged-union type rather than choosing them -- typically rather arbitrarily -- from an otherwise undistinguished non- tagged-union type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#16 · November 27, 2018, 10:19 am

A DV is not a sentinel. It is precisely a value that is different from and cannot be confused with any other values for the type and cannot be generated accidentally. For example, the values for the type YEARDATE might be defined to be -9999..-1, 1..9999, NOYEAR. This type has exactly 19999 values: there is no year zero and there is a year called NOYEAR. (Yes, the physical representation of NOYEAR might be an integer that is zero, smaller or larger than the other values but that is of no consequence.)

Yes, it is also possible to use union types to achieve a similar effect.

Andl - A New Database Language - andl.org

#17 · November 27, 2018, 10:48 am

Quote from dandl on November 27, 2018, 10:19 am

A DV is not a sentinel. It is precisely a value that is different from and cannot be confused with any other values for the type and cannot be generated accidentally. For example, the values for the type YEARDATE might be defined to be -9999..-1, 1..9999, NOYEAR. This type has exactly 19999 values: there is no year zero and there is a year called NOYEAR. (Yes, the physical representation of NOYEAR might be an integer that is zero, smaller or larger than the other values but that is of no consequence.)

Yes, it is also possible to use union types to achieve a similar effect.

"Distinguished value" is an informal term for what is more commonly known as a sentinel value. Use of sentinel values should be unconditionally deprecated. Per https://en.wikipedia.org/wiki/Sentinel_value, "In safe languages, most uses of sentinel values could be replaced with option types, which enforce explicit handling of the exceptional case."

Using values like 9999 to indicate a non-year is workable until some user defines "year" 9999 to have meaning outside the system that conflicts with its meaning inside the system.

A year value like NOTYEAR is presumably already a union of a non-decimal numeric type and NOTYEAR, given "NOTYEAR" isn't a numeric value.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#18 · November 27, 2018, 2:25 pm

TTM does not prohibit inclusion of a suitable shorthand that provides some kind of "outer" join capability.

Tutorial D includes no such operator because it wasn't really needed for the purposes for which that language was designed. There would be no objection to a propriety implementation of Tutorial D providing extra shorthands, so long as no RM Pres or Pros are violated.

Business System 12 did provide a kind of left join, calling it MERGE, and I wrote about it a long time ago in "Outer Join with No Nulls and Fewer Tears" (Relational Database Writings 1989-91 by C.J. Date with Hugh Darwen, Chapter 20). MERGE(r1, r2, r3), where r1 and r2 are joinable relations and r3 is a relation of cardinality one and heading that of r2 minus the common attributes of r1 and r2, gives (r1 JOIN r2) UNION ((r1 NOT MATCHING r2) JOIN r3). Thus, the single tuple in r3 provides values to fill the gaps (where SQL's LEFT JOIN would place NULL).

Of course there are other ways of addressing the same requirement, some of which are explored in our book Database Explorations, Part IV "Missing Information". Those that use relation-valued attributes were not available in Business System 12 because in those days (either side of 1980) we had gone along with Codd in believing RVAs were not needed.

Hugh

Coauthor of The Third Manifesto and related books.

#19 · November 27, 2018, 2:32 pm

Quote from Dave Voorhis on November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I think that would better be RELATION{TUPLE {matches p JOIN q, missing p NOT MATCHING q}}, preferable to abandoning closure (and equivalent to the EXTEND TABLE_DEE version, of course).

Hugh

Coauthor of The Third Manifesto and related books.

#20 · November 27, 2018, 2:58 pm

Quote from Dave Voorhis on November 27, 2018, 2:58 pm

Quote from Hugh on November 27, 2018, 2:32 pm

Quote from Dave Voorhis on November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I think that would better be RELATION{TUPLE {matches p JOIN q, missing p NOT MATCHING q}}, preferable to abandoning closure (and equivalent to the EXTEND TABLE_DEE version, of course).

Hugh

The TUPLE form avoids the missing FROM TUPLE FROM (p LEFTJOIN q) verbosity that potentially results (it's then just missing FROM (p LEFTJOIN q)), but I appreciate the point.

I think abandoning closure is only a minor concern, particularly as operators like COUNT(r) harmlessly break it.

Quote from Hugh on November 27, 2018, 2:32 pm

Quote from Dave Voorhis on November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I think that would better be RELATION{TUPLE {matches p JOIN q, missing p NOT MATCHING q}}, preferable to abandoning closure (and equivalent to the EXTEND TABLE_DEE version, of course).

Hugh

The TUPLE form avoids the missing FROM TUPLE FROM (p LEFTJOIN q) verbosity that potentially results (it's then just missing FROM (p LEFTJOIN q)), but I appreciate the point.

I think abandoning closure is only a minor concern, particularly as operators like COUNT(r) harmlessly break it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

The Forum for Discussion about The Third Manifesto and Related Matters