The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Why do we need natural joins?

PreviousPage 3 of 5Next

Whereas, the smug Lisp weenie says, dynamically typed languages are perfectly suited to the TTM type model because they have no problem being "meta": a tuple is a mapping from names to objects, a relation is a mapping from names to type predicates plus a set of tuples.

Quote from johnwcowan on November 4, 2019, 2:46 am
Quote from dandl on November 4, 2019, 2:06 am

And the difference: you cannot write or implement a natural join in Java (except by reflection or code generation). You can write and implement a theta join.

Okay, I understand now.  But I don't see how you can do θ-joins either in Java without reflection/generation.  To do it at the method call level, you need Attribute objects containing Strings and Classes that you can pass around, which means you can't represent tuples as POJOs: they would have to be of type Map<Attribute, Object> and can't be used without either explicit casting or reflective castAs.

Can you sketch what you have in mind?

A natural join on the D&D supplier database looks like this:

S join SP

The Codd theta join looks like this:

S[Sid=Sid]SP(Sid SNAME STATUS CITY Pid QTY) // projects away the duplicate Sid

The SQL version looks roughly like this:

select S.Sid, SNAME, STATUS, CITY, Pid, QTY from S,  SP where S.Sid = SP.Sid

The C# LINQ version looks like this:

S.Join(SP, (s,sp) => s.Sid = sp.Sid, (s,sp) => new { Sid=s.Sid, SNAME=s.SNAME, STATUS=s.STATUS, CITY=s.CITY, Pid=sp.Pid, QTY=sp.QTY })

I don't know Java streams so I can't tell you, but I'm sure it looks a lot like the C#. The Join method takes two function arguments, one to do the comparison, the other to fabricate the new record. Obviously it can do COMPOSE, MATCHING etc. NOT MATCHING is a different method call.

 

Andl - A New Database Language - andl.org
Quote from AntC on November 4, 2019, 2:57 am

John's correct/the causation is the other way round: the (ahem) natural choice of relations-as-sets, headings-as-sets lead to the Prescriptions defining a relation (value); and that leads to Natural JOIN being the (ahem) natural expression of FOPL AND for the predicates of which the relation values are extension.

I won't argue about whether the toast came first or the marmalade, they go well together. I'm asking, if there turns out to be a problem with marmalade (the American stuff is awful), what might we get if we start with tortillas?

Then tell me what purpose such an arcane type system

I see no "arcane" type system. The TTM type system is easy to describe (leaving out the IM). I see a lot of benighted users of type-systems in currently popular languages that seem to have their understandings crippled.

Arcane: "Known or understood by only a few". It's definitely arcane, regardless of how easy it is to learn.

Benighted: "Being in a state of moral or intellectual darkness; unenlightened." Maybe, but that's the way they like it.

serves other than to enable natural joins?

and disjunctions and relative complements (NOT MATCHING) and projections and restrictions-as-joins (ref Appendix A) and ... pretty much every operation you'd want in an expressive query language/algebra.

Of course, but the argument is the same. We're talking language design and practical implementations on reasonably widely used compiled languages, and all those things follow wherever join goes.

Why not choose a record type similar to C, PL/I and Pascal, other than the difficulty of describing natural joins in this languages?

The difficulty of "describing natural joins" is just a microcosm of the difficulty of the 'positional perspective' in general: for example, how do you specify the resulting column positions from a Cartesian Product? Remember your operations are in general nested, so that has to work for a Cartesian product whose operands are Cartesian products, etc, etc. Look how much of the SQL standard is preoccupied by trying to specify positional access to columns.

Perhaps I should be more explicit: the result of Natural JOIN is a relation value/the operation is 'closed over' relations. If your positional access has to say: this is a 4-ply deep JOIN/product, so you need 3-dot suffixes to access nested attributes, then you're working with values that are not relations/you have non-uniformity of representations.

I really don't see the problem. Can you show me some code? My guess is no matter what you write I can translate it mechanically into LINQ-style code, which will compile and run.

and that leads to a dead end in terms of irreconcilable type systems. If they made a different choice (such as theta joins), could that lead to a TTM Mk II with a different kind of type system, more like that found in modern statically typed programming languages? Is it possible to throw out the tuple and relation type generators as written, replace them by something different, and keep all the other useful work?

Type generators are just generic types by another name, cf. Haskell.   I don't understand the supposed difference between TTM and a subset of Java in which all objects are immutable.

I think the better term is a 'parameterised type'. And the difference: you cannot write or implement a natural join in Java (except by reflection or code generation). You can write and implement a theta join.

Can you? What if the operands happen to have some attribute name in common? Or are you excluding that possibility? Then how do you express in Java's type system: these collections must have no attributes in common? How do you write a Cartesian product in Java? How do you write a theta-selection over a Cartesian product?

You can't express any of those things, Java's type system is way too simple. But they are not requirement for TTM and you can most certainly write a Cartesian product in C# LINQ, and a selection over that. See samples posted elsewhere.

 

 

Andl - A New Database Language - andl.org
Quote from johnwcowan on November 4, 2019, 3:04 am

Whereas, the smug Lisp weenie says, dynamically typed languages are perfectly suited to the TTM type model because they have no problem being "meta": a tuple is a mapping from names to objects, a relation is a mapping from names to type predicates plus a set of tuples.

TTM assumes dynamically typed languages are rejected, for performance and safety reasons.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on November 4, 2019, 6:54 am

Can you? What if the operands happen to have some attribute name in common? Or are you excluding that possibility? Then how do you express in Java's type system: these collections must have no attributes in common? How do you write a Cartesian product in Java? How do you write a theta-selection over a Cartesian product?

You can't express any of those things, Java's type system is way too simple. But they are not requirement for TTM and you can most certainly write a Cartesian product in C# LINQ, and a selection over that. See samples posted elsewhere.

Are you claiming .NET LINQ can do something vis-a-vis Cartesian product of collections that Java Streams can't?

They appear to be essentially equivalent in capability, being based on essentially the same language and type system. Differences are negligible. C#'s LINQ query syntax might look friendlier compared to LINQ method syntax or Java Streams, but the latter two are notionally equivalent.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on November 3, 2019, 11:45 pm
Quote from Dave Voorhis on November 3, 2019, 4:29 pm

You seem to be echoing what I'd suggested elsewhere -- such as https://forum.thethirdmanifesto.com/forum/topic/how-to-resolve-the-type-system-relational-algebra-mismatch/#postid-985925 -- in terms of using Java Streams or .NET LINQ, but I'm not clear how some mainly-syntactic tweaks to JOIN (i.e., implicit matching of attributes based on common names vs explicit matching of attributes whether names are common or not) now make LINQ/Streams usable in a way they apparently weren't before. I already know they're usable -- I've made extensive use of LINQ and Streams -- and you probably have too. Using a (dynamically generated & compiled) class to represent a tuple type is what I'm doing in my datasheet project, as I've noted several times.

Whilst neither Streams nor LINQ are a relational algebra per se, they unquestionably embody its essential strength -- i.e., a set of data structures abstracted under a unified interface and an algebra specifying transformations to values/instances of that interface.

It's obviously a closely related theme, but what I'm saying is: D&D made a choice (natural join) in writing TTM, the choice led them to a certain kind of type system (with type generators and headings), and that leads to a dead end in terms of irreconcilable type systems. If they made a different choice (such as theta joins), could that lead to a TTM Mk II with a different kind of type system, more like that found in modern statically typed programming languages? Is it possible to throw out the tuple and relation type generators as written, replace them by something different, and keep all the other useful work? By my count, RM Pre 1-5, 8, 11-26 would survive with little or no change. The others would need to be rewritten.

I quoted LINQ-like code to show one possible approach, but with a different TTM type system as the starting point, there might be any number of ways to get there.

LINQ is only a small part of the solution. TTM aims to solve a bigger problem than just the type system and the RA.

As others have noted, the choice of natural join is the result of making choices about RA and type system design. The TTM RA and type system was not a result of making a choice to use natural join.

I don't see how the choice of natural join or not relates to tuple and relation type generators. The use of tuple and relation type generators seem more related to specifying non-scalar literals, variable and parameter types -- and use of UNION, INTERSECT and MINUS, etc. -- than they do to natural join.

Of course, there are various ways to implement an RA -- or a non-RA collections framework and associated algebra such as .NET LINQ or Java Streams -- but these are not TTM. That doesn't mean they wouldn't be equally valid or useful.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on November 4, 2019, 1:54 am
Quote from p c on November 3, 2019, 11:48 am

A mistake which seems to be universal is reading Codd as if he is prescribing language syntax when he defines operators in the abstract. So you get all these languages that implement the wrong idea that his explication of natural join must mean that the expression (R Join S) means that the names R and S represent joinable relations which they do not.

Please be aware that I am specifically referring to Codd's early writings, up to 1972. I find no mention of an expression in that form in his writings.

When named tables are assumed to be variables, or named relvars are assumed the mistake is only propagated.

Codd 1970 says: "A binary relation R is joinable with a binary relation S if there exists a ternary relation U such that a binary relation R is joinable with a binary relation S if there exists a ternary relation U such that pi1,2 (U) = R and pi2,3 (U) = S. Any such ternary relation is called a join of R with pi1,2 (U) = R and pi2,3 (U) = S. Any such ternary relation is called a join of R with S".

Where? I don't see any references to a ternary relation in my (limited) sources. And I find it hard to accept that as a direct quote, given the duplication in the wording.

...

The copy-and-paste from an Adobe pdf got mangled, perhaps by me trying to overcome the incompatibility with the Adobe font for the "pi" symbol. Anyway, here is a correction:

"A binary relation R is joinable with a binary relation S if there exists a ternary relation U such that pi 1,2 (U) = R and pi 2,3 (U) = S. Any such ternary relation is called a join of R with S. If R, S are binary relations such that pi 2 (R) = pi 1 (S), then R is joinable with S. One join that always exists in such a case is the natural join of R with S defined by ..."

I can't explain why you wouldn't find "ternary relation" in the pdf of Codd 1970. It appears in your quote. Maybe you are also looking at an edited copy that has also been mangled.

Quote from AntC on November 3, 2019, 8:55 pm

 

Furthermore JOIN is the natural generalisation of INTERSECT when the operands don't have identical headings; and the natural generalisation of TIMES when the operands don't have disjoint headings.

It subsumes selection operator as well. I'm puzzled why natural join generalizes 3 concepts, while inner union only 2...

Database practice, though, is messier than theory. If natural join is considered as a foundation of some query language, then what is the method of handling columns like CREATED_BY and CREATED_ON? Is RDBMS supposed to automatically generate "end-user-query-oriented" views that exclude such columns? Or every transaction is recorded with provenance information so that tuple modification date and user can be queried separately without having them in every table. Still, this design seems to be untrustworthy when adding columns to the schema breaks existing queries. Are there more examples of the columns similar to CREATED_BY and CREATED_ON?

 

UPDATED_BY and UPDATED_ON are obvious candidates.   This is (part of) what triggers are for.

Quote from Vadim on November 4, 2019, 6:24 pm
Quote from AntC on November 3, 2019, 8:55 pm

Furthermore JOIN is the natural generalisation of INTERSECT when the operands don't have identical headings; and the natural generalisation of TIMES when the operands don't have disjoint headings.

It subsumes selection operator as well. I'm puzzled why natural join generalizes 3 concepts, while inner union only 2...

Database practice, though, is messier than theory. If natural join is considered as a foundation of query language, then what is the method of handling columns like CREATED_BY and CREATED_ON? Is RDBMS supposed to automatically generate "end-user-query-oriented" views that exclude such columns? Or every transaction is recorded with provenance information so that tuple modification date and user can be queried separately. Still, this design seems to be untrustworthy when adding columns to the schema breaks existing queries. Are there more examples of the columns similar to CREATED_BY and CREATED_ON?

If the DBMS forces them to exist, then they're like any other attribute: project them away if you don't need them, rename them if you want to keep them and don't want to JOIN on them, etc.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
PreviousPage 3 of 5Next