The Third Manifesto ForumThe Third Manifesto: Tutorial DSupport for JOIN alternatives

Please or Register to create posts and topics.

Support for JOIN alternatives

#21 · November 27, 2018, 3:05 pm

Quote from Hugh on November 27, 2018, 3:05 pm

Quote from Dave Voorhis on November 27, 2018, 2:58 pm

Quote from Hugh on November 27, 2018, 2:32 pm

Quote from Dave Voorhis on November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I think that would better be RELATION{TUPLE {matches p JOIN q, missing p NOT MATCHING q}}, preferable to abandoning closure (and equivalent to the EXTEND TABLE_DEE version, of course).

Hugh

The TUPLE form avoids the missing FROM TUPLE FROM (p LEFTJOIN q) verbosity that potentially results (it's then just missing FROM (p LEFTJOIN q)), but I appreciate the point.

I think abandoning closure is only a minor concern, particularly as operators like COUNT(r) harmlessly break it.

Okay. I can always wrap p LEFTJOIN q in REL{ } myself, of course. You could also argue that the r3 in BS12's MERGE should have been a tuple rather than a relation of degree one, and I wouldn't disagree. We blindly assumed every operand and every result had to be a relation.

Hugh

Quote from Dave Voorhis on November 27, 2018, 2:58 pm

Quote from Hugh on November 27, 2018, 2:32 pm

Quote from Dave Voorhis on November 26, 2018, 4:00 pm

Quote from Erwin on November 26, 2018, 2:44 pm

Quote from Dave Voorhis on November 25, 2018, 8:21 pm

Quote from Erwin on November 25, 2018, 6:33 pm

... Another potentially useful idea is the following :

Define outer joins to return a relation, not of a bunch of attributes as they appear in the inputs, but of two or three relation-valued attributes.

The first relation-valued attribute is the natural join (the matching tuples)

The second relation-valued attribute is the SEMIMINUS between the first and the second join operand

And if it concerns a FULL outer join, the third relation-valued attribute is the SEMIMINUS between the second and the first operand.

That's what I suggested above in post #2 in this thread. In the TUPLE returned by LEFTJOIN, the matched attributed would be the result of the natural join, the missing attribute would be the result of the SEMIMINUS (aka NOT MATCHING).

An approach using sentinel values -- like 9999-12-13 -- has some practical utility in specific applications, but is a questionable general solution for the reasons you mentioned.

Yes I noticed that too late. You did say "missing SAME_TYPE_AS(q)" though which cannot be attributed to fat finger fault seeing how far p and q are apart on normal keyboards

Brief p LEFTJOIN q is shorthand for EXTEND TABLE_DEE {matching := p JOIN q , nonmatching := p NOT MATCHING q}.

Oops. Yes, I didn't mind my p's and q's.

An even shorter equivalence is that p LEFTJOIN q is -- or can be -- shorthand for TUPLE {matches p JOIN q, missing p NOT MATCHING q}.

I think that would better be RELATION{TUPLE {matches p JOIN q, missing p NOT MATCHING q}}, preferable to abandoning closure (and equivalent to the EXTEND TABLE_DEE version, of course).

Hugh

The TUPLE form avoids the missing FROM TUPLE FROM (p LEFTJOIN q) verbosity that potentially results (it's then just missing FROM (p LEFTJOIN q)), but I appreciate the point.

I think abandoning closure is only a minor concern, particularly as operators like COUNT(r) harmlessly break it.

Okay. I can always wrap p LEFTJOIN q in REL{ } myself, of course. You could also argue that the r3 in BS12's MERGE should have been a tuple rather than a relation of degree one, and I wouldn't disagree. We blindly assumed every operand and every result had to be a relation.

Hugh

Coauthor of The Third Manifesto and related books.

#22 · December 2, 2018, 2:26 pm

Quote from asulling on December 2, 2018, 2:26 pm

Quote from Dave Voorhis on November 26, 2018, 3:44 pm

These sound like UI issues rather than data management / database issues.

The trouble is that if the UI is highly CRUD-ish and lists displays are highly "tabular", then the distinction between "UI" and "DB" is a very very thin one ...

From a user's point of view, definitely. From a developer's point of view, partly. From a theoretical point of view, they couldn't be further apart.

Industrial applications may need to blur lines here, but I'm not sure Rel needs to.

Maybe I should elaborate my questions further. It seems analogous to me that it used to be the case that writing tests before rather than after producing the code that needs to be tested seemingly can't be further apart either. Is the Rel DBMS end user experience (or more accurately, that of the application developers, the men in the middle) considered to be so distant from the database layer that usability is still not a concern nowadays? I see that Tutorial D was not designed for such purposes, but did the optional extension previously mentioned remain in the "to do" list as the result of our discussion last weekend or rather not?

Surely the user model is based on criteria very distant from the data model and out of the developers' comfort zone, unfortunately, e.g. don't make me think or click (as the user is lazy, too) rather than keeping the foundations of relational algebra in mind, but a little layer/pattern of indirection would solve these problems (UI-considerate database output options rather than strictly database management or UI issues) with elegance:

https://en.wikipedia.org/wiki/Don%27t_Make_Me_Think

https://dl.acm.org/citation.cfm?id=3027111

Due to reusability concerns, such a DB adaptation/service layer belongs with the DBMS rather than the code querying it. If necessary then I can use your 5-line code in my project as a LEFTJOIN utility function, but that "solution" will be copied in other projects, potentially. Which implementations of D currently (or within the next few years as everyone is lazy) favor "blurring the lines" in general by providing an optional extension for improved usability?

Quote from Dave Voorhis on November 26, 2018, 3:44 pm

These sound like UI issues rather than data management / database issues.

The trouble is that if the UI is highly CRUD-ish and lists displays are highly "tabular", then the distinction between "UI" and "DB" is a very very thin one ...

From a user's point of view, definitely. From a developer's point of view, partly. From a theoretical point of view, they couldn't be further apart.

Industrial applications may need to blur lines here, but I'm not sure Rel needs to.

Maybe I should elaborate my questions further. It seems analogous to me that it used to be the case that writing tests before rather than after producing the code that needs to be tested seemingly can't be further apart either. Is the Rel DBMS end user experience (or more accurately, that of the application developers, the men in the middle) considered to be so distant from the database layer that usability is still not a concern nowadays? I see that Tutorial D was not designed for such purposes, but did the optional extension previously mentioned remain in the "to do" list as the result of our discussion last weekend or rather not?

Surely the user model is based on criteria very distant from the data model and out of the developers' comfort zone, unfortunately, e.g. don't make me think or click (as the user is lazy, too) rather than keeping the foundations of relational algebra in mind, but a little layer/pattern of indirection would solve these problems (UI-considerate database output options rather than strictly database management or UI issues) with elegance:

https://en.wikipedia.org/wiki/Don%27t_Make_Me_Think

https://dl.acm.org/citation.cfm?id=3027111

Due to reusability concerns, such a DB adaptation/service layer belongs with the DBMS rather than the code querying it. If necessary then I can use your 5-line code in my project as a LEFTJOIN utility function, but that "solution" will be copied in other projects, potentially. Which implementations of D currently (or within the next few years as everyone is lazy) favor "blurring the lines" in general by providing an optional extension for improved usability?

#23 · December 2, 2018, 7:45 pm

I've just implemented LEFT JOIN, RIGHT JOIN, and FULL JOIN in Rel per the discussions here. It will be available in the next update, which depending on how my free time shapes up over the next couple of weeks, will be released some time between a few days or a few weeks from now.

Tutorial D is designed to be a general-purpose language with database capabilities, or a database language with general-purpose programming capabilities. Rel is designed to allow users to use and explore Tutorial D plus experimental extensions. These extensions are typically based on discussions on this forum.

Over time, I will add enhancements to the Rel user interface to support creating reports, forms, and other facilities that are typical of end-user oriented database management systems.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#24 · December 3, 2018, 1:15 am

Quote from asulling on December 3, 2018, 1:15 am

Quote from Dave Voorhis on December 2, 2018, 7:45 pm

I've just implemented LEFT JOIN, RIGHT JOIN, and FULL JOIN in Rel per the discussions here. It will be available in the next update, which depending on how my free time shapes up over the next couple of weeks, will be released some time between a few days or a few weeks from now.

Thanks a lot for your time and effort! I'll try to integrate your DBMS in my little charity project that saved the clay court at Tallinn University of Technology, and depending on the results maybe consider using it for commercial purposes, too.

Quote from Dave Voorhis on December 2, 2018, 7:45 pm

Tutorial D is designed to be a general-purpose language with database capabilities, or a database language with general-purpose programming capabilities. Rel is designed to allow users to use and explore Tutorial D plus experimental extensions. These extensions are typically based on discussions on this forum.

Over time, I will add enhancements to the Rel user interface to support creating reports, forms, and other facilities that are typical of end-user oriented database management systems.

At the moment 8 commercial partners with more than 60 tennis courts in or near Tallinn have joined the system. We have players from nearly 10 countries (some of them are eager to try out the system at home, too), Java is a likely choice for the back end, and in addition to support for i18n at database level e.g. time interval comparison (like in PostgreSQL, an extension) would be needed for implementing scheduling functionality, probably. Currently I am operating just a little quick-and-dirty php/mysql hack from many years ago for financing purposes, however, trying to keep the system small and manageable as my software development skills catch up with the needs of our growing community and my plans for teamwork.

Maybe such a little overview can help you predict what we would really need of your DBMS over time.

Quote from Dave Voorhis on December 2, 2018, 7:45 pm

I've just implemented LEFT JOIN, RIGHT JOIN, and FULL JOIN in Rel per the discussions here. It will be available in the next update, which depending on how my free time shapes up over the next couple of weeks, will be released some time between a few days or a few weeks from now.

Thanks a lot for your time and effort! I'll try to integrate your DBMS in my little charity project that saved the clay court at Tallinn University of Technology, and depending on the results maybe consider using it for commercial purposes, too.

Quote from Dave Voorhis on December 2, 2018, 7:45 pm

Tutorial D is designed to be a general-purpose language with database capabilities, or a database language with general-purpose programming capabilities. Rel is designed to allow users to use and explore Tutorial D plus experimental extensions. These extensions are typically based on discussions on this forum.

Over time, I will add enhancements to the Rel user interface to support creating reports, forms, and other facilities that are typical of end-user oriented database management systems.

At the moment 8 commercial partners with more than 60 tennis courts in or near Tallinn have joined the system. We have players from nearly 10 countries (some of them are eager to try out the system at home, too), Java is a likely choice for the back end, and in addition to support for i18n at database level e.g. time interval comparison (like in PostgreSQL, an extension) would be needed for implementing scheduling functionality, probably. Currently I am operating just a little quick-and-dirty php/mysql hack from many years ago for financing purposes, however, trying to keep the system small and manageable as my software development skills catch up with the needs of our growing community and my plans for teamwork.

Maybe such a little overview can help you predict what we would really need of your DBMS over time.

#25 · February 16, 2019, 6:11 am

Quote from AntC on February 16, 2019, 6:11 am

Quote from asulling on November 24, 2018, 11:54 am

Is cohesive information output from a D-compatible database for practical purposes considered respectable and interesting, too? If not then why?

Sorry if I am missing something basic here, but I can't find the D "equivalent" of the SQL outer JOIN. I feel, however, that it is the responsibility of a relational database to keep all information about the entities in it, the relations between them as well as the entities missing from these relations ...

Thank you asulling, I think you have the beginnings of an excellent approach. To cover some of your preliminaries first ...

I see in your schema below that you're using keys and foreign key constraints to express "cohesive information" and "relations between ... entities". Good.

Then you're also in effect "keep[ing] all information ... as well as entities missing". You'll find a general distaste in TTM for the term 'missing information' (it's usually in scare quotes) as if the database used to have or should have that information, but somehow it's mislaid. A more likely explanation in case of a person without phone is that the database never had that information.

Then I take your question to be focussed on "output ... for practical purposes". IOW a D-compatible database does (respectably) capture the data and its cohesiveness. You're asking about various ways to represent the data to end-users, maybe who are only inspecting display screens provided by a UI developer, not themselves writing queries over the data/its relational structure.

For the "D equivalent" of outer join, I guess you could refer to a paper/chapter by Hugh Darwen in D&D's writings 'Outer Join with No Nulls and Fewer Tears' referenced from here, slide 3. I've never been very keen on that approach, for reasons I'll explain below. (Some of the response to your o.p. also touched on it.)

Rel ... I haven't found an elegant (declarative) way of getting the entities missing from the relations in the result set out of it yet, so any help in this regard would be appreciated, too.

You're already more than half way there, by using Relation-Valued Attributes (RVAs) in your result set. Rel has a neat browser that shows RVAs in clear style, with empty RVA attribute values very obvious.

Example in Tutorial D. ...,

a GROUP-clause (not in the SQL sense) for JOIN could be introduced in D ...

There are various operations you could use in Tutorial D, to achieve what you're after. Since you ask: yes that includes a GROUP operation (not in the SQL sense), which I would 'show off' for this query, except it would be overkill. (There are more advanced operators that could make this code even more succinct. I'll avoid those because it's not so easy to find info about them readily to hand.)

Does Tutorial D support a more compact alternative to the SQL "X left join Y" clause ...

There's not an out-of-the-box operation. There are tools to build one, as you've pretty much discovered.

Firstly re your "more compact ... [than] SQL ... left join":

SQL left outer joins are reasonably compact where the cardinality of the join left:right is 0-or-1: 1. You get one row in the output for each one from the right input. The columns to the left contain either values from the left input or are null. (The display format might or might not show nulls in a way you can tell them from empty strings or zero numbers; but that's a separate consideration. The Darwen approach I cited above also suffers this effect: instead of nulls with at least a hope of displaying them differently, it requires you supply default values. Those values must be legitimate values for those columns, so now you can't tell whether a cell is blank because there was no tuple to join to, or because there was and its column's value is blank. Similarly for default dates like 9999-12-31, as was mentioned in the thread.)

But if the cardinality is 0-or-1-or-more: 1, as with your phone example, SQL shows more rows, with the right input repeated on each row. This is not so compact, and downright confusing. The Darwen approach I cited above also suffers this.

Then the approach you've sketched with RVAs (as displayed in Rel) is excellent: for the 1 side of the join you get 1 row on the output. If the outer side has 0 matches, you get an empty box displayed for the RVA. If the outer side has many tuples, you get many rows inside the box.

... handing the output over to the UI? How can the following code be simplified?

The way to hand over output is as a relation IMO, which is what your code achieves. The UI presentation had better be able to cope with RVAs. Keeping with your schema (retained here for reference), I'd code the query as follows. (See explanations afterwards for operations I've used.)

VAR person REAL RELATION {name CHARACTER} KEY {name};
VAR phone REAL RELATION {number INTEGER, owner CHARACTER} KEY {number};
CONSTRAINT phone_fkey_person phone { owner } RENAME { owner AS name } <= person { name };
person := ...;
phone := ...;
...

VAR leftjoin REAL RELATION {name CHARACTER, phones RELATION {number INTEGER}} KEY {name};

leftjoin := person EXTEND { phones := ( (phone WHERE owner = name) {ALL BUT owner} ) };

Notes:

I've written this in a general style, such that the code would not need to change if relation person had more attributes than {name} or relation phone had more attributes than {owner, number}. That's why I've used 'project away' {ALL BUT owner} rather than 'project' {number}. (I am presuming there would be no clashes of same-named attributes between the two relations.)

The assignment to leftjoin takes the whole of relation person and extends each tuple with extra attributes (something like calculated columns in a SQL SELECT.)

Specifically, for each name in person, match to the owner in phone. (Luckily in this case, the base relations have different attribute names for the same name/owner. Often this step would need a rename.)

Having matched, project away the owner. (We no longer need it because it's the same value as name in the outer person relation.)

Notice that the WHERE is a relational operator: it takes a relation operand (phone) and returns a relation result (being phone restricted to those tuples with a matching owner for each tuple in the outer relation person). Or if there's no match to phone, the WHERE will return an empty relation.

That relation from the WHERE becomes the value for attribute phones.

Having obtained relation leftjoin, if you want " the entities missing from the relations in the result set " (I'll hold my nose with that 'missing' ;-):

leftjoin WHERE phones = RELATION{} {ALL BUT phones};

That is, tuples in leftjoin with the phones RVA containing an empty relation. Then project away the useless attribute.

BTW if you're feeling bamboozled by operations on whole-relations 'inside' tuple-level expressions like EXTEND, I see your schema already includes a whole-relation operation: the <= in your foreign key constraint is a whole-relation subset comparison.

Quote from asulling on November 24, 2018, 11:54 am

Is cohesive information output from a D-compatible database for practical purposes considered respectable and interesting, too? If not then why?

Sorry if I am missing something basic here, but I can't find the D "equivalent" of the SQL outer JOIN. I feel, however, that it is the responsibility of a relational database to keep all information about the entities in it, the relations between them as well as the entities missing from these relations ...

Thank you asulling, I think you have the beginnings of an excellent approach. To cover some of your preliminaries first ...

I see in your schema below that you're using keys and foreign key constraints to express "cohesive information" and "relations between ... entities". Good.
Then you're also in effect "keep[ing] all information ... as well as entities missing". You'll find a general distaste in TTM for the term 'missing information' (it's usually in scare quotes) as if the database used to have or should have that information, but somehow it's mislaid. A more likely explanation in case of a person without phone is that the database never had that information.
Then I take your question to be focussed on "output ... for practical purposes". IOW a D-compatible database does (respectably) capture the data and its cohesiveness. You're asking about various ways to represent the data to end-users, maybe who are only inspecting display screens provided by a UI developer, not themselves writing queries over the data/its relational structure.

For the "D equivalent" of outer join, I guess you could refer to a paper/chapter by Hugh Darwen in D&D's writings 'Outer Join with No Nulls and Fewer Tears' referenced from here, slide 3. I've never been very keen on that approach, for reasons I'll explain below. (Some of the response to your o.p. also touched on it.)

Rel ... I haven't found an elegant (declarative) way of getting the entities missing from the relations in the result set out of it yet, so any help in this regard would be appreciated, too.

You're already more than half way there, by using Relation-Valued Attributes (RVAs) in your result set. Rel has a neat browser that shows RVAs in clear style, with empty RVA attribute values very obvious.

Example in Tutorial D. ...,

a GROUP-clause (not in the SQL sense) for JOIN could be introduced in D ...

There are various operations you could use in Tutorial D, to achieve what you're after. Since you ask: yes that includes a GROUP operation (not in the SQL sense), which I would 'show off' for this query, except it would be overkill. (There are more advanced operators that could make this code even more succinct. I'll avoid those because it's not so easy to find info about them readily to hand.)

Does Tutorial D support a more compact alternative to the SQL "X left join Y" clause ...

There's not an out-of-the-box operation. There are tools to build one, as you've pretty much discovered.

Firstly re your "more compact ... [than] SQL ... left join":

SQL left outer joins are reasonably compact where the cardinality of the join left:right is 0-or-1: 1. You get one row in the output for each one from the right input. The columns to the left contain either values from the left input or are null. (The display format might or might not show nulls in a way you can tell them from empty strings or zero numbers; but that's a separate consideration. The Darwen approach I cited above also suffers this effect: instead of nulls with at least a hope of displaying them differently, it requires you supply default values. Those values must be legitimate values for those columns, so now you can't tell whether a cell is blank because there was no tuple to join to, or because there was and its column's value is blank. Similarly for default dates like 9999-12-31, as was mentioned in the thread.)
But if the cardinality is 0-or-1-or-more: 1, as with your phone example, SQL shows more rows, with the right input repeated on each row. This is not so compact, and downright confusing. The Darwen approach I cited above also suffers this.
Then the approach you've sketched with RVAs (as displayed in Rel) is excellent: for the 1 side of the join you get 1 row on the output. If the outer side has 0 matches, you get an empty box displayed for the RVA. If the outer side has many tuples, you get many rows inside the box.

... handing the output over to the UI? How can the following code be simplified?

The way to hand over output is as a relation IMO, which is what your code achieves. The UI presentation had better be able to cope with RVAs. Keeping with your schema (retained here for reference), I'd code the query as follows. (See explanations afterwards for operations I've used.)

VAR person REAL RELATION {name CHARACTER} KEY {name};
VAR phone REAL RELATION {number INTEGER, owner CHARACTER} KEY {number};
CONSTRAINT phone_fkey_person phone { owner } RENAME { owner AS name } <= person { name };
person := ...;
phone := ...;
...

VAR leftjoin REAL RELATION {name CHARACTER, phones RELATION {number INTEGER}} KEY {name};

leftjoin := person EXTEND { phones := ( (phone WHERE owner = name) {ALL BUT owner} ) };

Notes:

I've written this in a general style, such that the code would not need to change if relation person had more attributes than {name} or relation phone had more attributes than {owner, number}. That's why I've used 'project away' {ALL BUT owner} rather than 'project' {number}. (I am presuming there would be no clashes of same-named attributes between the two relations.)
The assignment to leftjoin takes the whole of relation person and extends each tuple with extra attributes (something like calculated columns in a SQL SELECT.)
Specifically, for each name in person, match to the owner in phone. (Luckily in this case, the base relations have different attribute names for the same name/owner. Often this step would need a rename.)
Having matched, project away the owner. (We no longer need it because it's the same value as name in the outer person relation.)
Notice that the WHERE is a relational operator: it takes a relation operand (phone) and returns a relation result (being phone restricted to those tuples with a matching owner for each tuple in the outer relation person). Or if there's no match to phone, the WHERE will return an empty relation.
That relation from the WHERE becomes the value for attribute phones.

Having obtained relation leftjoin, if you want " the entities missing from the relations in the result set " (I'll hold my nose with that 'missing' ;-):

leftjoin WHERE phones = RELATION{} {ALL BUT phones};

That is, tuples in leftjoin with the phones RVA containing an empty relation. Then project away the useless attribute.

BTW if you're feeling bamboozled by operations on whole-relations 'inside' tuple-level expressions like EXTEND, I see your schema already includes a whole-relation operation: the <= in your foreign key constraint is a whole-relation subset comparison.

#26 · June 12, 2019, 10:52 pm

Overall I think the union type generator Maybe is probably a win, but I haven't fully explored it yet, even mentally.

The Forum for Discussion about The Third Manifesto and Related Matters

Support for JOIN alternatives