The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

What is the purpose of relations containing tuples/relations?

PreviousPage 2 of 7Next
Quote from johnwcowan on June 13, 2019, 12:46 am

I was trying to understand the utility of GROUP, UNGROUP, WRAP, and UNWRAP. It's clear that something like them must exist, since tuples (and therefore relations) are allowed to contain nested tuples and relations.  But what is the utility of such nesting other than pure orthogonality?  Are there known practical applications?

You've already commented on a thread that showed the utility of GROUP/UNGROUP and RVAs.

Look at the thread started a few months ago by @asuling 'Support for JOIN alternatives' -- i.e. something corresponding to OUTER JOIN (loosely speaking). Dave made one suggestion (and implemented it in Rel); I made an alternative suggestion. Bothe suggestions are better than your Maybe type IMO. For me, the strong benefit of Tutorial D's GROUP is that it is information-preserving: you can always UNGROUP back to the 'flattened' relation. Then it's a matter of ergonomics for querying whether you store your base relation as GROUPed or flattened. (With SQL that's a non-option.)

As Dave remarked, WRAP/UNWRAP and TVAs I find more difficult to motivate -- perhaps as a 'poor man's product-type: attribute type Point as a TVA {XCoord RAT, YCoord RAT}. This might be an improvement over TTM's (RM Pre 4) selector operators/components, because those construct compound values by position, not name; and because the multiple PossReps idea is rather idiosyncratic. YMMV.

Then we can throw away RM Pre 4, and it's sets all the way down. (Note that the ordered pair of <Attribute name, value> can be modeled as a set, see  https://en.wikipedia.org/wiki/Ordered_pair#Defining_the_ordered_pair_using_set_theory  The now-accepted Kuratowski definition works particularly smoothly for TTM, because Attribute name is always atomic. This is also TTM's riposte to D L Childs.)

 

 

Quote from johnwcowan on June 13, 2019, 9:01 pm
Quote from Dave Voorhis on June 13, 2019, 8:31 pm

In Tutorial D, you'd presumably need to go from this:

VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY {x}, KEY {y, z};

To something like this:

VAR MyVar REAL RELATION {x INT, y INT, z CHAR} KEY MyVar_Key1 {x}, KEY Myvar_Key2 {y, z};

When the only manipulation you're ever likely to do, rarely, is something like this?

ALTER MyVar KEY {y}, KEY {x, z};

I'm not yet convinced that's a Good Thing.

What if you realize that one attribute of your key was {in,ex}cluded in error? ...

The 'business rules' for keys to relvars (or indeed keys to any relation value, inferred per RM VSS 2) are:

  • each key must be unique (taking its component attributes as a set);
  • no key's attribute set can be a proper superset of some other key.

If we take a relvar's keys as a set of sets of attribute names: bullet 1 is satisfied directly (you can't insert duplicates into a set); bullet 2 is easily satisfied as a constraint expressed using set operators.

Whereas if keys are named (and there might be arbitrarily many of them/the number of keys declared for some relvar might vary over the life of the database), you're going to have a hard time expressing those constraints.

ALTER MyVar KEY {a, b, c, d, g, h} ADD {e};

 

No you're not going to have your users directly updating the catalog using D statements. Because updating the catalog does not of itself create a relvar or alter its keys. There will be a CLI front-end that will build/update the relvar and generate the update statements to the catalog (under transaction control).

Quote from AntC on June 14, 2019, 4:56 am
Quote from dandl on June 14, 2019, 12:36 am
Quote from Erwin on June 13, 2019, 9:50 am

The standard process of "flattening out" (replacing the set-level value with an identifier and listing the set contents in another relvar) can be applied, but unless you can find an identifier that caries business meaning, you are introducing surrogates in the design, which are then subsequently very likely to get exposed to the user :

This seems to me spurious.

This seems to me an attitude born of too much contact with SQL/coming to see its limitations as somehow inevitable.

You know absolutely nothing about my contact or otherwise with SQL. Your assertion is as spurious as your other claims.

For the case Erwin notes (recording in the catalog the keys for a relvar), using an RVA (and making that the key) seems the only natural approach. As Erwin says, there's no other identifier that carries business meaning. Introducing surrogates like Key1, Key2, Key3, ... is spurious and arbitrary and easily muddled if you want to add/remove/change a key.

I agree that no other identifier offers itself; I disagree that using an RVA is 'natural'. Perhaps you would like to propose a fully worked example either to support your case or to allow me to demonstrate its other shortcomings.

These both strike me as a case for using something other than a relational database. (I'm not saying it can't be done with a relational database; but I'm reminded of Dr Johnson's remark " ... like a dog's walking on his hind legs. It is not done well; but you are surprised to find it done at all.")

Why on earth would you provide an example where the RM is not a good choice in support of the proposition that using RVAs in the RM is a good idea? I see no point in pursuing the issue.

Shall we do the traditional thing and separate logical design from physical? Then the place for surrogate keys is the physical design -- by which I mean the design for SQL, because it doesn't have RVAs; and even for some other DBMSs which allow 'repeating groups', AFAIAA they don't allow repeating groups as keys. I do say surrogate keys have no place in logical design. Where by "surrogate key" I mean some arbitrarily-chosen identifier not derived in any algorithmic way from the content of the tuple it denotes.

Now we're headed off into places completely unrelated to the original issue. Apart from suggesting some reading (such as https://en.wikipedia.org/wiki/Surrogate_key) I'll just let that one through to the keeper.

Thinking back over the multiple rounds of discussions, it's been really difficult to define "surrogate key" in such a way to distinguish it from pointer (OO Pro 2).

Are you serious? No comment.

Your proposition as I understood it was: RVAs should be chosen to represent certain kinds of data in a relvar because to do otherwise would be to introduce surrogate keys that do not have have business meaning but might become visible to the business. Although database design, normalisation, surrogate keys and related topics have an extensive literature taking into account a myriad of factors, you would like to short-circuit the process: problem X, surrogate keys bad, RVAs available, use RVAs, problem solved.

My view is that this kind of simplistic analysis will see RVAs used where they should not be; that surrogate keys are not the real issue; and even where they may be a case for using them there are real problems they cause that do not otherwise arise. But you've got a hammer and you've found a problem that looks like a nail, so have at it!

Andl - A New Database Language - andl.org
Quote from dandl on June 14, 2019, 11:33 am

[...]

My view is that this kind of simplistic analysis will see RVAs used where they should not be; that surrogate keys are not the real issue; and even where they may be a case for using them there are real problems they cause that do not otherwise arise. But you've got a hammer and you've found a problem that looks like a nail, so have at it!

I'm surprised this is such an emotional issue.

In Rel, keys are represented in the catalog using RVAs, where the relevant bits of the sys.Catalog relvar is defined as:

RELATION {Name CHARACTER, ... , Keys RELATION {Attributes RELATION {Name CHARACTER}}}

It started out as an experiment in defining base relvars with RVAs. I guess the experiment was a success because it's still there, It Just Works, and there aren't enough downsides to be worth dropping use of an RVA. But had I started out with keys defined in separate relvar, I'm sure that would have been fine too.

This looks like a case where if RVAs are available and work for the case at hand, they're fine. If RVAs aren't available or don't work for the case at hand, don't use them.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on June 14, 2019, 5:25 am

For me, the strong benefit of Tutorial D's GROUP is that it is information-preserving: you can always UNGROUP back to the 'flattened' relation.

I guess that's the best point: RVAs are useful because they make GROUP work smoothly.

As Dave remarked, WRAP/UNWRAP and TVAs I find more difficult to motivate -- perhaps as a 'poor man's product-type: attribute type Point as a TVA {XCoord RAT, YCoord RAT}. This might be an improvement over TTM's (RM Pre 4) selector operators/components, because those construct compound values by position, not name; and because the multiple PossReps idea is rather idiosyncratic. YMMV.

A good candidate for multiple possreps is complex numbers.  In Fortran, C, C++, Python, and Julia, there is only one possrep, and that's the rectangular one: a pair <real, imaginary>.  In the Lisps and Pure, though, there are two equally valid possreps: the pair <real, imaginary> and the pair <magnitude, angle>.  There is no way to tell how a particular complex number was created, and all operations are available on both possreps.  (In all implementations I know of, the physical representation is rectangular.)

Note: "Business" applications shouldn't use floats for money amounts, of course, but many businesses do deal in measurements, and measurements are inherently inexact, making hardware-assisted float arithmetic useful even though numerically incorrect -- which is why it has to be a separate scalar type.

 

Quote from johnwcowan on June 14, 2019, 6:39 pm
Quote from AntC on June 14, 2019, 5:25 am

As Dave remarked, WRAP/UNWRAP and TVAs I find more difficult to motivate -- perhaps as a 'poor man's product-type: attribute type Point as a TVA {XCoord RAT, YCoord RAT}. This might be an improvement over TTM's (RM Pre 4) selector operators/components, because those construct compound values by position, not name; and because the multiple PossReps idea is rather idiosyncratic. YMMV.

A good candidate for multiple possreps ...

Heh heh. There you go trampling over ground that's already well ploughed. Multiple possreps was originally aimed at format or units conversion. For date formats it works well.

For (say) metric/imperial or temperature scale conversion already there's difficulty: two slightly different metric values might convert to the same imperial; or converting metric to imperial back to metric might return a different value. The practicalities are because of "hardware-assisted float arithmetic", as you say below. Presumably that's why Tutorial D eschews Floats and has only RATIONAL.

So the difficulty is RM Pre 8 "the very same value". Is 1 inch "the very same value" as 0.0254 metres? Of course stories abound of NASA/international consortiums using different units with conversions that are not accurate enough when scaled up to distances in space. What if your database is getting readings from different sensors calibrated in different units? Rel, it appears, uses multiple possreps to record not only the value but whatever units it was captured in. Furthermore if you use the default table browser, it'll display the recorded value and units as entered (i.e. differently), even if a comparison says they're equal. This seems to me in clear violation of RM Pre 8 (Leibniz equality)

"It follows from this prescription that if (a) there exists an operator Op (other than “=” itself) with a parameter P of declared type T such that (b) two successful invocations of Op that are identical in all respects except that the argument corresponding to P is v1 in one invocation and v2 in the other are distinguishable in their effect, then (c) v1 = v2 must evaluate to FALSE."

These conversion effects get worse with representing complex numbers, apart from units conversion:

... is complex numbers.  In Fortran, C, C++, Python, and Julia, there is only one possrep, and that's the rectangular one: a pair <real, imaginary>.  In the Lisps and Pure, though, there are two equally valid possreps: the pair <real, imaginary> and the pair <magnitude, angle>.

So is a polar <5.0, 0°> "the very same value" as <5.0, 360°>? They don't look the same. If you convert or store both to rectangular and convert back to polar, you don't get back what you started with. How about polar <0.0, n°> for any n?

There is no way to tell how a particular complex number was created, and all operations are available on both possreps.  (In all implementations I know of, the physical representation is rectangular.)

Again if D&D kept up with modern languages ... If the original units of record were critical information, your data type should be a tagged union, wot TTM does not 'ave. Then in general for RM Pre 8, 1 inch is not the very same value as 0.0254 m. But you might support a comparison operator for approximately-equal-upon-conversion. And if your language supports tagged unions it might go further and support something like Haskell's view patterns or pattern synonyms to represent all values (approximately) in some common units.

Note: "Business" applications shouldn't use floats for money amounts, of course, but many businesses do deal in measurements, and measurements are inherently inexact, making hardware-assisted float arithmetic useful even though numerically incorrect -- which is why it has to be a separate scalar type.

 

RM Pre 4's distinction between physical representation vs "possible" representation in the language is important (and Oh! that SQLers understood it better). But the multiple possreps bit I find fraught with difficulties; I just would not do that.

Quote from AntC on June 15, 2019, 2:05 am

For (say) metric/imperial or temperature scale conversion already there's difficulty: two slightly different metric values might convert to the same imperial; or converting metric to imperial back to metric might return a different value. The practicalities are because of "hardware-assisted float arithmetic", as you say below. Presumably that's why Tutorial D eschews Floats and has only RATIONAL.

Conversions between metric, Imperial, and U.S. Customary have in fact been exact for decades, and can be handled correctly by exact rational arithmetic.  The trouble in that case is that the measurement itself is not exact.  The conventional adult human body temperature in the U.S. is the very precise-sounding 98.6 degrees F, especially when you consider that a degree F is only 5/9 the size of a degree C.  But in fact this is just the result of applying the exact conversion F = C / 5 * 9 + 32 to the very rough temperature of 37 degrees C, which is averaged over many observations.

So the difficulty is RM Pre 8 "the very same value". Is 1 inch "the very same value" as 0.0254 metres?

It is, provided the inch is exactly an inch!

 Rel, it appears, uses multiple possreps to record not only the value but whatever units it was captured in. Furthermore if you use the default table browser, it'll display the recorded value and units as entered (i.e. differently), even if a comparison says they're equal. This seems to me in clear violation of RM Pre 8 (Leibniz equality)

If it does exact arithmetic, then that's fine.  After all, there should be no complaint if a system sometimes prints 0.2 and and sometimes 1/5.

So is a polar <5.0, 0°> "the very same value" as <5.0, 360°>? They don't look the same. If you convert or store both to rectangular and convert back to polar, you don't get back what you started with.

You don't, but that's because 360 degrees is 2π, and almost no computer languages, except for those that do symbolic computation, can handle irrationals at all.  (It's not impossible: a language might support real numbers of the special form a+bπ, for example.)  It's a different order of problem.  For this reason the angle selector is one of the very few operators in Scheme that can return an inexact result on exact arguments.

Quote from AntC on June 15, 2019, 2:05 am
Quote from johnwcowan on June 14, 2019, 6:39 pm
Quote from AntC on June 14, 2019, 5:25 am

A good candidate for multiple possreps ...

Heh heh. There you go trampling over ground that's already well ploughed. Multiple possreps was originally aimed at format or units conversion. For date formats it works well.

For (say) metric/imperial or temperature scale conversion already there's difficulty: two slightly different metric values might convert to the same imperial; or converting metric to imperial back to metric might return a different value. The practicalities are because of "hardware-assisted float arithmetic", as you say below. Presumably that's why Tutorial D eschews Floats and has only RATIONAL.

So the difficulty is RM Pre 8 "the very same value". Is 1 inch "the very same value" as 0.0254 metres? Of course stories abound of NASA/international consortiums using different units with conversions that are not accurate enough when scaled up to distances in space. What if your database is getting readings from different sensors calibrated in different units? Rel, it appears, uses multiple possreps to record not only the value but whatever units it was captured in. Furthermore if you use the default table browser, it'll display the recorded value and units as entered (i.e. differently), even if a comparison says they're equal. This seems to me in clear violation of RM Pre 8 (Leibniz equality)

Not just well-ploughed ground... We've worn a trench several feet deep.

I appreciate the opposition to Rel's approach. I've mentioned before that as a user of the facility, I like it. It eases development and debugging. This is a case of pragmatics (and pedagogically-helpful transparent casing so you can see inside) over purity. (I know, "BUT, BUT PURITY!" But pragmatics & pedagogy! Etc.)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from johnwcowan on June 15, 2019, 3:33 am
Quote from AntC on June 15, 2019, 2:05 am

For (say) metric/imperial or temperature scale conversion already there's difficulty: two slightly different metric values might convert to the same imperial [...]

Rel, it appears, uses multiple possreps to record not only the value but whatever units it was captured in. Furthermore if you use the default table browser, it'll display the recorded value and units as entered (i.e. differently), even if a comparison says they're equal. This seems to me in clear violation of RM Pre 8 (Leibniz equality)

If it does exact arithmetic, then that's fine.

It doesn't do exact arithmetic, because Rel's RATIONAL is a thin skin around Java's double. It's fine for pedagogy and close enough for government work.

I'm sure I could fix this but other priorities, etc.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from AntC on June 14, 2019, 5:48 am

No you're not going to have your users directly updating the catalog using D statements. Because updating the catalog does not of itself create a relvar or alter its keys. There will be a CLI front-end that will build/update the relvar and generate the update statements to the catalog (under transaction control).

Funny.  That's exactly what I do/did, and at some point in time when the forum was still on mail I learned that that's exactly what Hugh did in BS12 too.  At least that's how I understood it.

PreviousPage 2 of 7Next