## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

# What is the purpose of relations containing tuples/relations?

Page 1 of 7Next

I was trying to understand the utility of GROUP, UNGROUP, WRAP, and UNWRAP. It's clear that something like them must exist, since tuples (and therefore relations) are allowed to contain nested tuples and relations.  But what is the utility of such nesting other than pure orthogonality?  Are there known practical applications?

Nested relations I can see, a little bit: they create something like a hierarchical relation, associating a scalar in one attribute with a relation containing an arbitrary number of tuples in a different attribute (which in turn can have a relation-valued attribute with arbitrary numbers of tuples, and so on).  But there are better ways to represent that without direct nesting, ways that can handle variable and arbitrary depths rather than only a fixed depth.  Nested tuples I don't see at all: what do they do that having separate attributes cannot?

Update:  I have now read Chapter 8 of Database Explorations, which helps a bit, though not about nested tuples (which are isomorphic to relations with a fixed cardinality of 1).  I also note that the Maybe types I've been discussing can be modeled as nested relations of degree 1 with either 1 tuple (Just x) or 0 tuples (Nothing).

Second update:  If you actually want 3VL, you can define it with the values (Just true), (Just false), and Nothing, which can be modeled with a nested relation with one boolean attribute and 0 or 1 tuples, a special case of the above.  But the key point is that the operators of 3VL have nothing to do with the 2VL operators!  Indeed, there are three basic dyadic operations which I will call TRI_AND, TRI_OR, and TRI_MERGE.  TRI_AND returns true if both operands are (Just true), but if one operand is (Just false) or Nothing, then it is returned.  Note that one needs a bias, such as left bias or Just-over-Nothing bias or Nothing-over-Just bias,to determine what TRI_AND((Just false), Nothing) and TRI_AND (Nothing, (Just false)) return.  Similarly, TRI_OR returns (Just false) if both operands are (Just false), and mutatis mutandis; TRI_MERGE returns Nothing if both operands are Nothing, and mutatis mutandis.  The SQL 3VL operators have Nothing-over-Just bias.  Lastly, TRI_NOT returns (Just false) on (Just true), (Just true) on (Just false), and Nothing on Nothing.

Nested relations are extremely useful as soon as you start to think about aggregation. We've had lengthy discussions on this forum about the underlying theoretical basis for grouping and aggregation, and their relationship to projection and certain kinds of joins. They really matter.

Nested tuples are probably mostly there for orthogonality. I don't remember finding a compelling use for them.

Andl - A New Database Language - andl.org

The canonical example of where RVA's are "needed"/"inevitable" is a catalog relvar that records which keys each relvar has.

It gives propositions like "{A C} is a key to relvar X" and "{C H} is a key to relvar X".

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user :

"relvar X has a key 7" and "relvar X has a key 83" and

"key 7 includes attribute A" and "key 7 includes attribute C" and "key 83 includes attribute H" and "key 83 includes attribute C".

(Observe the pitfalls for people to stumble into with the latter design if and when the proposition "{} is a key to relvar X" is valid and possible".)

A second example I once came up with is recording the methods of a java class in a database.  Note that a java method is identified (within a given class) by its "signature" which is the combination of its invocation name and the ordered list of parameter type declarations.  The latter, when relationally modeled, becomes a relation with attributes "ordinal position" and "type name", and note that this relation is part of the identifier for a method.

That's, ingeneral, when RVA's could be beneficial : if there's some sort of set-level or set-like value that is part of an identifier for something else.

And last but not least, outside the realm of base relvars : having (pun intended) GROUP and RVA's to one's avail, facilitates writing SQL stuff like "HAVING COUNT(*) > 1" as a regular WHERE.  Hugh feels strongly about how this simplifies the language per se.

Quote from johnwcowan on June 13, 2019, 12:46 am

I was trying to understand the utility of GROUP, UNGROUP, WRAP, and UNWRAP. It's clear that something like them must exist, since tuples (and therefore relations) are allowed to contain nested tuples and relations.  But what is the utility of such nesting other than pure orthogonality?  Are there known practical applications?

I use Rel for day-to-day desktop data crunching, the sorts of things often done with MS Excel or Access, and regularly have occasion to use GROUP and UNGROUP. If time permits, I'll try to find a concise example, as a specific one doesn't come to mind at the moment.

I don't recall ever using WRAP and UNWRAP.

Quote from Erwin on June 13, 2019, 9:50 am

The canonical example of where RVA's are "needed"/"inevitable" is a catalog relvar that records which keys each relvar has.

It gives propositions like "{A C} is a key to relvar X" and "{C H} is a key to relvar X".

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user

In this particular case, I think giving a key a user-visible name is a Good Thing so that it can be manipulated.

A second example I once came up with is recording the methods of a java class in a database.  Note that a java method is identified (within a given class) by its "signature" which is the combination of its invocation name and the ordered list of parameter type declarations.  The latter, when relationally modeled, becomes a relation with attributes "ordinal position" and "type name", and note that this relation is part of the identifier for a method.

I think that's a more compelling case, though modeling it as an array makes sense too, since the essence of arrays is that they are ordinally indexed and their size is not part of their type.

And last but not least, outside the realm of base relvars : having (pun intended) GROUP and RVA's to one's avail, facilitates writing SQL stuff like "HAVING COUNT(*) > 1" as a regular WHERE.  Hugh feels strongly about how this simplifies the language per se.

I don't understand this case; can you give a concrete example?  Thanks.

Quote from johnwcowan on June 13, 2019, 7:44 pm
Quote from Erwin on June 13, 2019, 9:50 am

The canonical example of where RVA's are "needed"/"inevitable" is a catalog relvar that records which keys each relvar has.

It gives propositions like "{A C} is a key to relvar X" and "{C H} is a key to relvar X".

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user

In this particular case, I think giving a key a user-visible name is a Good Thing so that it can be manipulated.

In Tutorial D, you'd presumably need to go from this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY {x}, KEY {y, z};`

To something like this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY MyVar_Key1 {x}, KEY Myvar_Key2 {y, z};`

When the only manipulation you're ever likely to do, rarely, is something like this?

`ALTER MyVar KEY {y}, KEY {x, z};`

I'm not yet convinced that's a Good Thing.

Quote from Dave Voorhis on June 13, 2019, 8:31 pm

In Tutorial D, you'd presumably need to go from this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY {x}, KEY {y, z};`

To something like this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY MyVar_Key1 {x}, KEY Myvar_Key2 {y, z};`

When the only manipulation you're ever likely to do, rarely, is something like this?

`ALTER MyVar KEY {y}, KEY {x, z};`

I'm not yet convinced that's a Good Thing.

What if you realize that one attribute of your key was {in,ex}cluded in error?  Then you might have

VAR MyVar REAL REATION {...} KEY {a, b, c, d, g, h};

and need to add e because future tuples will not be unique on those six fields.  Then you need to say one of these:

ALTER MyVar KEY {a, b, c, d, g, h} ADD {e};

ALTER MyVar KEY {a, b, c, d, g, h} TO {a, b, c, d, e, g, h};

BEGIN TRANSACTION;
ALTER MyVar DELETE KEY {a, b, c, d, g, h};
ALTER MYVAR KEY {a, b, c, d, e, g, h};
COMMIT;

The first is tolerable but annoying; the second is error-prone.  The third is doubly error-prone: you might screw up the attribute sets or forget the transaction wrapper.

But with ALTER MyVar CREATE KEY Key1 {a, b, c, d, g, h} you can say ALTER MyVar KEY Key1 ADD {e}, no transaction needed, end of story.  Related operations are ALTER MyVar KEY Key1 REMOVE {a} and ALTER MyVar REMOVE KEY Key1.

Note also that key names should not have to be globally unique; this requires a foreign-key declaration to explicitly name the parent relvar, but that is no great burden, something like FOREIGN KEY LinkKey {a, r, q} TO RELVAR YourVar KEY YourKey.

Quote from johnwcowan on June 13, 2019, 9:01 pm
Quote from Dave Voorhis on June 13, 2019, 8:31 pm

In Tutorial D, you'd presumably need to go from this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY {x}, KEY {y, z};`

To something like this:

`VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY MyVar_Key1 {x}, KEY Myvar_Key2 {y, z};`

When the only manipulation you're ever likely to do, rarely, is something like this?

`ALTER MyVar KEY {y}, KEY {x, z};`

I'm not yet convinced that's a Good Thing.

What if you realize that one attribute of your key was {in,ex}cluded in error?  Then you might have

VAR MyVar REAL REATION {...} KEY {a, b, c, d, g, h};

and need to add e because future tuples will not be unique on those six fields.  Then you need to say one of these:

ALTER MyVar KEY {a, b, c, d, g, h} ADD {e};

ALTER MyVar KEY {a, b, c, d, g, h} TO {a, b, c, d, e, g, h};

BEGIN TRANSACTION;
ALTER MyVar DELETE KEY {a, b, c, d, g, h};
ALTER MYVAR KEY {a, b, c, d, e, g, h};
COMMIT;

The first is tolerable but annoying; the second is error-prone.  The third is doubly error-prone: you might screw up the attribute sets or forget the transaction wrapper.

But with ALTER MyVar CREATE KEY Key1 {a, b, c, d, g, h} you can say ALTER MyVar KEY Key1 ADD {e}, no transaction needed, end of story.  Related operations are ALTER MyVar KEY Key1 REMOVE {a} and ALTER MyVar REMOVE KEY Key1.

Note also that key names should not have to be globally unique; this requires a foreign-key declaration to explicitly name the parent relvar, but that is no great burden, something like FOREIGN KEY LinkKey {a, r, q} TO RELVAR YourVar KEY YourKey.

Maybe I'm missing something here, but if I inadvertently omit desired attributes or include undesired attributes in a relvar's KEY specification, can't I simply do another ALTER MyVar KEY ... ?

Your approach seems like more complexity in general to simplify problems that rarely happen, whereas my approach (which is implemented in Rel) is usually simple but a bit more complex for problems that rarely happen.

Though it appears the choice of approach is a matter of subjective preference rather than an objectively logical distinction.

Quote from Erwin on June 13, 2019, 9:50 am

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user :

This seems to me spurious. The implication would be that every kind of database design with a hierarchical element should be modelled as a nested relvar unless the nested element has a key provided by or at least acceptable to the user. Alternatively, that surrogate keys have no place in database design. Additionally that normalisation even to 3NF is only performed if it won't generate surrogate keys.

My view is that nested relations are a really important capability to use in queries, but should almost never be used in a database relvar. If that results in a need for surrogate keys, then we already know how to deal with them. D&D have written extensively on the subject.

Andl - A New Database Language - andl.org
Quote from dandl on June 14, 2019, 12:36 am
Quote from Erwin on June 13, 2019, 9:50 am

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user :

This seems to me spurious.

This seems to me an attitude born of too much contact with SQL/coming to see its limitations as somehow inevitable. For the case Erwin notes (recording in the catalog the keys for a relvar), using an RVA (and making that the key) seems the only natural approach. As Erwin says, there's no other identifier that carries business meaning. Introducing surrogates like `Key1, Key2, Key3, ...` is spurious and arbitrary and easily muddled if you want to add/remove/change a key.

The implication would be that every kind of database design with a hierarchical element should be modelled as a nested relvar unless the nested element has a key provided by or at least acceptable to the user.

No Erwin did not say "every ... design". I can think of two examples where

a) it wouldn't be ergonomic to model a hierarchy as nested RVAs; but

b) there's no 'natural' business identifier that could be used as key.

1. The AST for a language parser.
2. The chemical bonding structure for organic/long chain molecules.

These both strike me as a case for using something other than a relational database. (I'm not saying it can't be done with a relational database; but I'm reminded of Dr Johnson's remark " ... like a dog's walking on his hind legs. It is not done well; but you are surprised to find it done at all.")

Alternatively, that surrogate keys have no place in database design.

Shall we do the traditional thing and separate logical design from physical? Then the place for surrogate keys is the physical design -- by which I mean the design for SQL, because it doesn't have RVAs; and even for some other DBMSs which allow 'repeating groups', AFAIAA they don't allow repeating groups as keys. I do say surrogate keys have no place in logical design. Where by "surrogate key" I mean some arbitrarily-chosen identifier not derived in any algorithmic way from the content of the tuple it denotes.

Additionally that normalisation even to 3NF is only performed if it won't generate surrogate keys.

Again I think you're confusing logical design with physical design.

My view is that nested relations are a really important capability to use in queries, but should almost never be used in a database relvar.

I disagree. But we're lacking practical evidence/experience because so few DBMSs support RVAs let alone RVAs as keys.

If that results in a need for surrogate keys, then we already know how to deal with them. D&D have written extensively on the subject.

Thinking back over the multiple rounds of discussions, it's been really difficult to define "surrogate key" in such a way to distinguish it from pointer (OO Pro 2).

Page 1 of 7Next