The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

What is the purpose of relations containing tuples/relations?

PreviousPage 4 of 7Next
Quote from dandl on June 15, 2019, 2:46 pm
Quote from AntC on June 15, 2019, 2:08 pm

Whereas confronting learners with 'multiple possreps' is of negative pedagogical value: no other language/DBMS does that.

Per contra, many languages expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types. The Java immutable class Instant exposes no less than six, taking these respective arguments:

  1. the number of seconds since the Epoch,
  2. the number of milliseconds since the Epoch,
  3. the number of seconds and the number of nanoseconds within that second since the Epoch
  4. the ISO 8601 string representation,
  5. an instance of a supertype that includes various other time-related values/objects in Java,
  6. an instance of a user-controlled clock (typically one that doesn't tick at all for the sake of reproducibility).

A few languages provide a moderately seamless way to convert from/to some base representation; but they clearly say on the tin one of the representations is base, the others derived;

It so happens that number 3 above is the physical representation (at least in Oracle's Java), but I only know that because I looked at the source code.  The documentation doesn't say, nor should it.

This really has been discussed at length on this forum. I think the consensus is that TTM was ahead of its time but now lags in this area; that tagged unions are a better thing

Yesterday I read (I fear with small understanding of much of it) "Type-Indexed Rows" (Shields & Meijer, 2001).  The core idea there, which I do think has something to offer the RM community, is that rather than elements of tuples being required to have unique ordinals identifying their members (as in Python or SQL) or unique names (as in D), they are required to have unique types.  This means that a tuple type is just the product of its component types, and a sum type is perfectly aligned with it, its instances being any of the specified types (by definition distinct).  Of course this requires the ability to create new scalar types with only one component a la Haskell newtype, but D guarantees that.  We see this already being done in the TTM book with types like S# and P# (which I tend to read as "S-sharp", "T-sharp").  It is certainly theoretically neater for each component of a tuple to be a pair rather than a triple, and since each type has to have a name, the advantages for a programming language are the same.

Update:  I forgot to add that it does create a problem for TCLOSE, where the attributes in TTM are expected to have different names but the same type.  It might be necessary to have a variety of = that can pierce the newtype veil.

Quote from johnwcowan on June 15, 2019, 5:50 pm
Quote from AntC on June 15, 2019, 2:13 pm

 

? We're all thoroughly metric down here. A miss is as good as a kilometre. And if you're approximating your inches as 0.0254 metres, you'd be missing by a great deal more than that.

"A gramme of prevention is worth a kilogramme of cure."  —Old Aussie proverb

"Ay, every centimetre a king!"  —King Lear, government schools edition

But 0.0254m is no approximation: it is the definition of an inch, by the international agreement of 1959 by which the national standards bodies of Australia, Canada, New Zealand, South Africa, the UK, and the U.S. agreed that the yard (36 inches) would thenceforth be exactly 0.9144m and the avoirdupois pound (16 ounces) exactly 0.45359237 kg.

Speaking as another Antipodean, we have been metric since 1974, mostly. We measure TVs and some clothes in inches, sheep stations in millions of acres and sometimes rainfall in inches (or feet). But we get close.

Yes, the old Imperial measures have all been metricated, domesticated and mostly eradicated, with AFAICT the sole exception of the UK pint due to its long relationship with beer. The exact conversion factors are usable when needed.

The US measures are alive and well and not so well metricated. Few will be aware that the US fl oz is 29.5735295625ml (exactly). Even fewer will care.

The key point is that even the weird numbers are still exact conversions. Since every measurement is likewise exact (albeit with a presumed error), every value that has ever been obtained or calculated directly from  real world observations is exact. Most quantities and values that we deal with would be best represented as an exact value and an associated unit of measurement, ideally with a range of automatic conversions to other units as required. I remain puzzled that in the decades I've been in the business, this has not been more widely recognised and remains a frequent source of serious errors. Java provides an excellent example of how to mess it up, but it certainly is not alone.

Andl - A New Database Language - andl.org
Quote from dandl on June 16, 2019, 12:51 a

The US measures are alive and well and not so well metricated. Few will be aware that the US fl oz is 29.5735295625ml (exactly). Even fewer will care.

Few in the U.S. care either.  A few things (soda and alcohol) are sold in metric units, and otherwise the butter says "1 LB" and then in very small print somewhere "454 g".

The key point is that even the weird numbers are still exact conversions.

Just so.

Since every measurement is likewise exact (albeit with a presumed error),

No, I can't agree with you there.  "Exact" and "measurement" is a contradiction in terms.  Exact numbers come from counts (and only if they are not "too vulgar big") and from defined values. I have exactly ten fingers (so far), but each one of them can only be measured inexactly, whatever units I use.

Quote from johnwcowan on June 15, 2019, 9:59 pm
Quote from dandl on June 15, 2019, 2:46 pm
Quote from AntC on June 15, 2019, 2:08 pm

Whereas confronting learners with 'multiple possreps' is of negative pedagogical value: no other language/DBMS does that.

Per contra, many languages ...

John, if I may start with a general remark. Your posts over the past couple of days are jumping around grabbing bits and pieces from different languages, different approaches, different paradigms even. Each of those approaches/models might be coherent within themselves. There are certain 'family resemblances' (particularly between Codd's RA, TTM's D, Tropashko's systems, SQL less so). But you can't just muddle them all together and expect to arrive at something that's still coherent. So most of what you say below is either outright wrong, or so muddled as to be incoherent.

expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types.

Just. No. TTM does not have 'objects', that is does not have values encapsulated inside things whereby you must call methods to access the values. TTM's scalar types, Selector-defined types with components, attribute values and their types are all plain compare-by-value not compare-by-reference.

The Java immutable class Instant exposes no less than six, taking these respective arguments:

  1. the number of seconds since the Epoch,
  2. the number of milliseconds since the Epoch,
  3. the number of seconds and the number of nanoseconds within that second since the Epoch
  4. the ISO 8601 string representation,
  5. an instance of a supertype that includes various other time-related values/objects in Java,
  6. an instance of a user-controlled clock (typically one that doesn't tick at all for the sake of reproducibility).

There's nothing corresponds to that in TTM. Just. No. I'm on the verge of saying "not even wrong".

A few languages provide a moderately seamless way to convert from/to some base representation; but they clearly say on the tin one of the representations is base, the others derived;

It so happens that number 3 above is the physical representation (at least in Oracle's Java), but I only know that because I looked at the source code.  The documentation doesn't say, nor should it.

This really has been discussed at length on this forum. I think the consensus is that TTM was ahead of its time but now lags in this area; that tagged unions are a better thing

Yesterday I read (I fear with small understanding of much of it) "Type-Indexed Rows" (Shields & Meijer, 2001).  The core idea there, which I do think has something to offer the RM community, is that rather than elements of tuples being required to have unique ordinals identifying their members (as in Python or SQL) or unique names (as in D), they are required to have unique types.  This means that a tuple type is just the product of its component types, and a sum type is perfectly aligned with it, its instances being any of the specified types (by definition distinct).

IIRC S&M expect their rows to be product-types, that is constructed positionally. So potentially two rows with the same set of element types but in a different sequence are not the same type. They wrap their constructors in smarts to avoid that being exposed, by defining a canonical ordering of types. This is in a long series of approaches to row algebras dating back to early 1990's. (The Gaster & Jones TRex paper 1996 has a brief survey of the literature. There's a persistent challenge of retaining Principal Typing, which different approaches tackle differently -- and some fail to tackle.)

wrt sum types you seem to be mixing up with Variant types aka type-indexed co-products. In a sum type, the tags of the union are each within the overall type. Whereas with a Variant type, each tag is a distinct type. So you can distinguish a reading in Celsius as a different type vs a reading in Fahrenheit. Contrast that a tagged union/sum type, the readings are within the same type, your code must unwrap the tag at value level, not type, to find which.

There's potentially a very powerful records system could be built using Shields & Meijer, aka type-indexed products in the HList paper. Maybe that could be extended to a relational-alike system. It's not TTM. S&M consider only rows in isolation. How do you extend it to multiple rows in multiple relations with inter-relational operations?  Relational completeness requires a RENAME operation (or equivalent). It requires a restriction operation whereby we compare a value inside one attribute with a value inside a differently-named attribute. If different attribute name (in TTM terms) means different type (in S&M terms), how can you compare? You need at least means to extract the value from inside its indexed type in the row.

None of that is insoluble. But it's not directly what TTM does.

Of course this requires the ability to create new scalar types with only one component a la Haskell newtype, but D guarantees that.  We see this already being done in the TTM book with types like S# and P# (which I tend to read as "S-sharp", "T-sharp").

I'll relate a bit of history: I suggested the way to build a database is first to define the 'data dictionary' (with types/attribute names S#, P#, etc). Then we could avoid having both a Selector S# and an attribute name S# with the potential for putting an attribute in some relation named S# but of type P#. A side benefit would be that all relations are automatically join-compatible, as you mention on another thread. This was shouted down (despite there being one TTM implementation that actually does this for base relvars) on grounds people in ad-hoc queries want to be free to use ad-hoc attribute names like X, Y without having to first declare them in the data dictionary, and without being limited to use the same attribute type for all attributes named X -- even in the same query session.

But you go for it.

It is certainly theoretically neater for each component of a tuple to be a pair rather than a triple, and since each type has to have a name, the advantages for a programming language are the same.

Each component is really only an <attribute name, value> pair. The third element attribute type can be derived, thanks to RM Pre 2.

I certainly think there's merit in allowing <A, V> pairs as first-class citizens. Yes they'd be equivalent to Haskell newtypes with the data constructor name bound in. I don't see anything in TTM that prohibits that from your D.

Update:  I forgot to add that it does create a problem for TCLOSE, where the attributes in TTM are expected to have different names but the same type.  It might be necessary to have a variety of = that can pierce the newtype veil.

TCLOSE is just one example of needing inter-relational operations between different-named attributes.

Quote from AntC on June 16, 2019, 9:08 am
Quote from johnwcowan on June 15, 2019, 9:59 pm

expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types.

Just. No. TTM does not have 'objects', that is does not have values encapsulated inside things whereby you must call methods to access the values. TTM's scalar types, Selector-defined types with components, attribute values and their types are all plain compare-by-value not compare-by-reference.

The Java immutable class Instant exposes no less than six, taking these respective arguments:

  1. the number of seconds since the Epoch,
  2. the number of milliseconds since the Epoch,
  3. the number of seconds and the number of nanoseconds within that second since the Epoch
  4. the ISO 8601 string representation,
  5. an instance of a supertype that includes various other time-related values/objects in Java,
  6. an instance of a user-controlled clock (typically one that doesn't tick at all for the sake of reproducibility).

There's nothing corresponds to that in TTM. Just. No. I'm on the verge of saying "not even wrong".

I might quarrel with some of John's choice of terminology, e.g., "expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types." (emphasis mine) -- where "selected values of TTM types" would be preferable to "instances of TTM scalar types" to avoid even a hint of TTM/OO mashup -- and I can't find the Instant as it's described above. Instead, I find https://docs.oracle.com/javase/9/docs/api/java/time/Instant.html but its "static methods" do seem to somewhat resemble the description above.

However, John's point -- that the facility for multiple constructors for an immutable value class in a typical object-oriented language is notionally isomorphic to TTM's multiple possreps -- is sound:

In a typical object-oriented language, multiple constructors for a given class may be used to create various instances which have an unspecified and hidden internal representation, but that appropriately reflects the chosen constructor. All instances of that class can be compared with each other as if they had all been instantiated with the same constructor.

Similarly, in TTM, multiple possreps for a given type may be used to select various values which have an unspecified and hidden internal representation, but that appropriately reflects the chosen selector. All values of that type can be compared with each other as if they had all been selected with the same selector.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from johnwcowan on June 16, 2019, 5:35 am

Since every measurement is likewise exact (albeit with a presumed error),

No, I can't agree with you there.  "Exact" and "measurement" is a contradiction in terms.  Exact numbers come from counts (and only if they are not "too vulgar big") and from defined values. I have exactly ten fingers (so far), but each one of them can only be measured inexactly, whatever units I use.

I think you may have clipped some context. That should be read as "every measurement is an exact number". You cannot make a measurement that is a fraction (1/3) or a transcendental (sin(0.1)). All measurements are exact numbers, to whatever precision is available, and with whatever error may be inherent in the method. This remains true after simple arithmetic that produces a result in the same or a derived unit, and should be enforced by judicious rounding.

Floating point (approximate) numbers come into their own when computing predicted values from a model. That's a surprisingly rare thing to do, but seems to have heavily coloured the design of early processors and language designs. Fortran anyone?

Andl - A New Database Language - andl.org
Quote from dandl on June 16, 2019, 2:27 pm
Quote from johnwcowan on June 16, 2019, 5:35 am

Since every measurement is likewise exact (albeit with a presumed error),

No, I can't agree with you there.  "Exact" and "measurement" is a contradiction in terms.  Exact numbers come from counts (and only if they are not "too vulgar big") and from defined values. I have exactly ten fingers (so far), but each one of them can only be measured inexactly, whatever units I use.

I think you may have clipped some context. That should be read as "every measurement is an exact number". You cannot make a measurement that is a fraction (1/3) or a transcendental (sin(0.1)). All measurements are exact numbers, to whatever precision is available, and with whatever error may be inherent in the method. This remains true after simple arithmetic that produces a result in the same or a derived unit, and should be enforced by judicious rounding.

Floating point (approximate) numbers come into their own when computing predicted values from a model. That's a surprisingly rare thing to do, but seems to have heavily coloured the design of early processors and language designs. Fortran anyone?

Floating point is also a "Swiss Army knife" numeric type that meets most requirements for a non-integer type by making acceptably pragmatic compromises between performance, precision, and range.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on June 16, 2019, 2:27 pm

 

I think you may have clipped some context. That should be read as "every measurement is an exact number". You cannot make a measurement that is a fraction (1/3) or a transcendental (sin(0.1)). All measurements are exact numbers, to whatever precision is available, and with whatever error may be inherent in the method. This remains true after simple arithmetic that produces a result in the same or a derived unit, and should be enforced by judicious rounding.

I think our difference must be terminological.  I certainly make measurements of 1/3 cup (just short of 80ml; note that U.S. cooks mostly measure by volume rather than weight) of milk, water, oil, etc.  To me exact means 'error-free'.

 

Quote from Dave Voorhis on June 16, 2019, 10:42 am

I might quarrel with some of John's choice of terminology, e.g., "expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types." (emphasis mine) -- where "selected values of TTM types" would be preferable to "instances of TTM scalar types" to avoid even a hint of TTM/OO mashup -- and I can't find the Instant as it's described above. Instead, I find https://docs.oracle.com/javase/9/docs/api/java/time/Instant.html but its "static methods" do seem to somewhat resemble the description above.

I agree that "values of TTM types" would have been better.  Keeping terminology straight among multiple communities is difficult, especially when concepts overlap and there is no exact synonymy.  I also blurred the distinction on the OO side between constructors and factory methods (operators that return values of a type but accept only argument(s) of some other type(s)), as it makes no difference in the context of immutable classes.

However, John's point -- that the facility for multiple constructors for an immutable value class in a typical object-oriented language is notionally isomorphic to TTM's multiple possreps -- is sound:

I appreciate your saying so, though I would say they are the same concepts tout court.

Quote from Dave Voorhis on June 16, 2019, 10:42 am
Quote from AntC on June 16, 2019, 9:08 am
Quote from johnwcowan on June 15, 2019, 9:59 pm

expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types.

Just. No. TTM does not have 'objects', that is does not have values encapsulated inside things whereby you must call methods to access the values. TTM's scalar types, Selector-defined types with components, attribute values and their types are all plain compare-by-value not compare-by-reference.

The Java immutable class Instant exposes no less than six, taking these respective arguments:

  1. the number of seconds since the Epoch,
  2. the number of milliseconds since the Epoch,
  3. the number of seconds and the number of nanoseconds within that second since the Epoch
  4. ...

There's nothing corresponds to that in TTM. Just. No. I'm on the verge of saying "not even wrong".

I might quarrel with some of John's choice of terminology, e.g., "expose multiple constructors for their objects, which iff immutable are exactly instances of TTM scalar types." (emphasis mine) -- where "selected values of TTM types" would be preferable to "instances of TTM scalar types" to avoid even a hint of TTM/OO mashup ...

Sorry Dave, I don't think you have avoided polluting this with OO concepts. I can't reconcile what you claim with TTM's Pres and Pros.

However, John's point -- that the facility for multiple constructors for an immutable value class in a typical object-oriented language is notionally isomorphic to TTM's multiple possreps -- is sound:

In a typical object-oriented language, multiple constructors for a given class may be used to create various instances which have an unspecified and hidden internal representation, but that appropriately reflects the chosen constructor.

Firstly there's Hugh's repeated and vehement objection to regarding TTM Selectors as equivalent to OO constructors.

Would I be right in thinking that invoking one of these constructors:

a) Allocates a patch of memory;

b) Executes some code to put values into that memory;

c) Returns a pointer to that memory?

OK it's an immutable value ex hypothesi, but it's a pointer (contra OO Pro 2) not a value simpliciter.

Would I be further right in thinking that distinct program code that invokes the same constructor with the same arguments would also allocate a different patch of memory and return a different pointer value? Then

All instances of that class can be compared with each other as if they had all been instantiated with the same constructor.

There might be a comparison operator (however it's spelled) that says the references at the ends of those pointers are the same value. But this being OO, there's also a differently-spelled operator that will distinguish whether they're the same pointer and tell you they're not the same. So this fails the Leibniz equality test RM Pre 8 "It follows from this prescription that if ...", that I quoted recently.

Then OO multiple-constructors are not "isomorphic to TTM's multiple possreps". There is deliberately no operation allowed by TTM that is isomorphic to testing for pointer equality. Prefixing "notionally" is just weasel-words trying to evade a logical difference -- one of those big ones.

 

PreviousPage 4 of 7Next