The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

PreviousPage 9 of 22Next
Quote from dandl on October 27, 2019, 11:49 pm
Quote from Dave Voorhis on October 27, 2019, 9:12 am
Quote from dandl on October 27, 2019, 4:00 am
Quote from Dave Voorhis on October 26, 2019, 10:12 am

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

No, the TD type system is not simple. The scalar types are simple (if you ignore POSSREPs) but tuple and relation type generators are absolutely not. They are arcane and absolutely alien to most people who first see them.

Not sure what you mean. What's arcane and alien about them?

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

  1. are unique features of TTM/D -- not found in other languages (*)
  2. cannot be added to other languages (*) except by compiler modifications
  3. are named by means of a type generator, such that the name itself cannot be written
  4. cause confusion, disagreement and disbelief amongst those who ask questions about or challenge TTM, but never enthusiastic acceptance.

(*) Other languages that are fully implemented and used by at least a handful of people, and findable by Web search.

Items 1,2 and 4 also apply to POSSREPs.

POSSREPs maybe; they are a bit of a semantic hurdle for students without a reasonably strong programming background, but having taught this stuff to students who included minimally technical individuals -- for some maybe a 1st year introductory programming course -- as well as good programmers, they almost invariably found relations, tuples, and the relational algebra operators easy to understand. It wasn't unusual for students to say it was easier to understand than another language they learned at the same time: SQL.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from dandl on October 27, 2019, 11:49 pm

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

  1. are unique features of TTM/D -- not found in other languages (*)
  2. cannot be added to other languages (*) except by compiler modifications
  3. are named by means of a type generator, such that the name itself cannot be written

Haskell and Julia have named tuples and sets thereof, and both languages are used by far more than a handful.

 

Quote from johnwcowan on October 28, 2019, 12:04 am
Quote from dandl on October 27, 2019, 11:49 pm

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

  1. are unique features of TTM/D -- not found in other languages (*)
  2. cannot be added to other languages (*) except by compiler modifications
  3. are named by means of a type generator, such that the name itself cannot be written

Haskell and Julia have named tuples and sets thereof, and both languages are used by far more than a handful.

I was sure someone would mention Haskell. As I understand it, the regular released version of GHC Haskell cannot support the TTM tuples and relation types. As I recall there are namespace issues for field names, and record types are not first class.

There are numerous compiler modifications around that extend Haskell in various directions, and amongst those are tools that can, more or less. At best it all works just fine, but not many people are using it. At worst, it's still a work in progress. Best to look to @AntC for more informed comment than mine.

Even if Haskell can do it, it's the only one and it's right on the edge.

Andl - A New Database Language - andl.org
Quote from dandl on October 28, 2019, 12:58 am
... Best to look to @AntC for more informed comment than mine.
OK I'll do my best. (I was about to disagree anyway with David's "cannot be added to other languages(*)"
Quote from johnwcowan on October 28, 2019, 12:04 am
Quote from dandl on October 27, 2019, 11:49 pm

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

  1. are unique features of TTM/D -- not found in other languages (*)
  2. cannot be added to other languages (*) except by compiler modifications
  3. are named by means of a type generator, such that the name itself cannot be written

Haskell and Julia have named tuples and sets thereof, and both languages are used by far more than a handful.

I was sure someone would mention Haskell. As I understand it, the regular released version of GHC Haskell cannot support the TTM tuples and relation types.

So I need to digress about "regular released version(s) of ... Haskell". There is a formal Language Report standard, currently 2010 being minor tweaks to 1998. Nobody programs in just that. (Well, beginners might on introductory courses.) There's a bunch of language extensions stable since 2006 or before, available in the two major compilers: GHC and Hugs. There's unfortunately some subtle differences between those two compilers in how they handle corner cases of the interaction of different extensions. But ...

TTM tuples can be mimicked with those long-standing/stable extensions, as demonstrated in 'HList' 2004, where they're called "type-indexed products". Now the 2004 paper bumped into some difficulties in those corner cases: so I said above "since 2006", because that's when they got sorted out in GHC (but the 2004 paper wasn't re-published/revised, even though the resulting software package was. You can download HList from Hackage.)

It wasn't sorted out in Hugs until ... 2018. You can download a compiler mod from ... me.

As I recall there are namespace issues for field names, and record types are not first class.

You're possible mixing up two or three different things: standard Haskell record/tuple-types with named components have namespace issues; there's been a bunch of compiler mods to ease the pain, but it hasn't gone away. Those record types are first-class (i.e. product types).

HList uses a different mechanism altogether. There are no namespace issues; but there are type coherence challenges. Those challenges I attribute to poor implementation in GHC of some of those extensions (particularly where they interact). The Hugs compiler has a much more coherent implementation of those extensions that does not risk type coherence (but means you must write your code in a rather restricted form: there's code that GHC accepts and probably runs OK, fingers crossed; but that Hugs rejects -- because it's not prepared to rely on crossed fingers). My modified Hugs compiler preserves the coherence features.

There's a third record-alike system available in Hugs only 'TRex' Typed Records with extensibility. No namespace issues; coherent type inference; it's arguable whether they're first-class, because they're lacking some of the extensibility that would make them generically-typed.

There are numerous compiler modifications around that extend Haskell in various directions, and amongst those are tools that can, more or less. At best it all works just fine, but not many people are using it.

Lots of people are using packages built over HList. Total 13k+ downloads, 283 in the last 30 days.

At worst, it's still a work in progress. Best to look to @AntC for more informed comment than mine.

Even if Haskell can do it, it's the only one and it's right on the edge.

No it's not the only one: there's several experimental Haskell-like compilers/languages. The possibly-promising direction is to learn from dependent-typing theory (languages/tools like Agda, Coq, Idris). It's currently a research challenge to figure out how those languages' type-solving/inference can be implemented on an industrial language -- the problem looks NP-hard in general: is there a limited form that is expressive enough without making type inference blow up?

The essential difficulty for all those typing theories is statically preserving the uniqueness of attribute names within a TTM-style tuple: you need to express that attribute name doesn't appear elsewhere in the tuple; it's difficult to express the type of 'elsewhere in the tuple', given the attribute's offset is unknown; it's difficult to express 'doesn't appear' because type theory is oriented to constructing types (constructivist logic doesn't like negations), not deconstructing types to look for absence conditions.

To pick up on a previous point from David

Items 1,2 and 4 also apply to POSSREPs.

They might apply to multiple PossReps (and esp the requirements for total mapping between representations). A single PossRep is just a product type with named components, which is a feature in many languages and I haven't found to cause confusion.

Quote from AntC on October 28, 2019, 3:54 am
Quote from dandl on October 28, 2019, 12:58 am
... Best to look to @AntC for more informed comment than mine.
OK I'll do my best. (I was about to disagree anyway with David's "cannot be added to other languages(*)"
Quote from johnwcowan on October 28, 2019, 12:04 am
Quote from dandl on October 27, 2019, 11:49 pm

I think you're being ingenuous, but I'll bite. The tuple and relation types defined by TTM:

  1. are unique features of TTM/D -- not found in other languages (*)
  2. cannot be added to other languages (*) except by compiler modifications
  3. are named by means of a type generator, such that the name itself cannot be written

Haskell and Julia have named tuples and sets thereof, and both languages are used by far more than a handful.

I was sure someone would mention Haskell. As I understand it, the regular released version of GHC Haskell cannot support the TTM tuples and relation types.

So I need to digress about "regular released version(s) of ... Haskell". There is a formal Language Report standard, currently 2010 being minor tweaks to 1998. Nobody programs in just that. (Well, beginners might on introductory courses.) There's a bunch of language extensions stable since 2006 or before, available in the two major compilers: GHC and Hugs. There's unfortunately some subtle differences between those two compilers in how they handle corner cases of the interaction of different extensions. But ...

Re-reading this post, somebody landing on it from outer space might think Haskell is in a parlous state: how can tuples be so hard? I'd better tell some of the back story.

Of course in Haskell (as in any language) you could pass tuple access requests as strings, and hold tuples as association lists of arbitrary length. Then access time is O(n/2) on average (n is the number of elements in the tuple). Far worse is that there's no guarantee that the element labelled QTY is actually a numeric value. So access can fail at run-time -- either because there's no element with the label you've asked for, or far worse there's a type error/memory violation; and not to mention the overheads of pointers down the association list.

So the ideal is the record's elements are held as a vector of values; the index into it is determined statically at compile time, so access time is O(1); there's no need to hold the labels (type erasure semantics). That's what Haskell's standard record mechanism does. The labels amount to offsets into the record's vector. (That's how the namespace issues arise: each label must uniquely identify one record type, with one offset; then it's hard to support multiple record types having the same-named label; the compiler can't be sure of identifying at compile time what's the record type.)

HList is a half-way house: tuples are held as association lists with the structure determined at compile time. If from the access request it can be determined at compile time which label, that can be compiled to O(1) access time, effectively 'pre-walking' the pointers. If not, access is O(n/2) but at least access is usually guaranteed type safe -- unless you trip over coherence issues. Because of the potential need for dynamic access, the memory footprint is greater than a vector: pointers and type tags.

TRex holds records as vectors, with labels type-erased; access time is O(1). Dynamic run-time access is supported, providing you provide a label literal. What you can't do is generic operations like: does this record have a label same as that record? Do they have the same element type at that label? Are the elements equal? In which it would be a type error to test for equality without type-proving the answers to the earlier qs.

TTM tuples can be mimicked with those long-standing/stable extensions, as demonstrated in 'HList' 2004, where they're called "type-indexed products". Now the 2004 paper bumped into some difficulties in those corner cases: so I said above "since 2006", because that's when they got sorted out in GHC (but the 2004 paper wasn't re-published/revised, even though the resulting software package was. You can download HList from Hackage.)

It wasn't sorted out in Hugs until ... 2018. You can download a compiler mod from ... me.

As I recall there are namespace issues for field names, and record types are not first class.

You're possible mixing up two or three different things: standard Haskell record/tuple-types with named components have namespace issues; there's been a bunch of compiler mods to ease the pain, but it hasn't gone away. Those record types are first-class (i.e. product types).

HList uses a different mechanism altogether. There are no namespace issues; but there are type coherence challenges. Those challenges I attribute to poor implementation in GHC of some of those extensions (particularly where they interact). The Hugs compiler has a much more coherent implementation of those extensions that does not risk type coherence (but means you must write your code in a rather restricted form: there's code that GHC accepts and probably runs OK, fingers crossed; but that Hugs rejects -- because it's not prepared to rely on crossed fingers). My modified Hugs compiler preserves the coherence features.

There's a third record-alike system available in Hugs only 'TRex' Typed Records with extensibility. No namespace issues; coherent type inference; it's arguable whether they're first-class, because they're lacking some of the extensibility that would make them generically-typed.

 

Quote from AntC on October 28, 2019, 3:54 am

At worst, it's still a work in progress. Best to look to @AntC for more informed comment than mine.

Even if Haskell can do it, it's the only one and it's right on the edge.

No it's not the only one: there's several experimental Haskell-like compilers/languages. The possibly-promising direction is to learn from dependent-typing theory (languages/tools like Agda, Coq, Idris). It's currently a research challenge to figure out how those languages' type-solving/inference can be implemented on an industrial language -- the problem looks NP-hard in general: is there a limited form that is expressive enough without making type inference blow up?

The essential difficulty for all those typing theories is statically preserving the uniqueness of attribute names within a TTM-style tuple: you need to express that attribute name doesn't appear elsewhere in the tuple; it's difficult to express the type of 'elsewhere in the tuple', given the attribute's offset is unknown; it's difficult to express 'doesn't appear' because type theory is oriented to constructing types (constructivist logic doesn't like negations), not deconstructing types to look for absence conditions.

OK, it's nice to know that Haskell can (with some difficulty) handle what Dave's students find trivially easy, and it's nice to know there are a few other people on the planet interested in solutions to the problem, but I stand by my observations. The tuples and relations of RM Pre 6 and 7 are arcane and highly unusual, if not totally alien. They don't fit nicely into most other reasonably common languages.

To pick up on a previous point from David

Items 1,2 and 4 also apply to POSSREPs.

They might apply to multiple PossReps (and esp the requirements for total mapping between representations). A single PossRep is just a product type with named components, which is a feature in many languages and I haven't found to cause confusion.

Obviously I meant POSSREPs as per RM Pre 4, which offers no such choice. Excluding specific features is just an acknowledgment that RM Pre 4 too is arcane, highly unusual, if not totally alien.

But it's easy to fix RM Pre 4 to the point of making it fully compatible with type systems found in other languages. It's not so easy to fix or replace RM Pre 6 and 7. I might start a new thread for that.

Andl - A New Database Language - andl.org
Quote from dandl on October 26, 2019, 11:28 pm
Quote from Brian S on October 26, 2019, 4:19 pm

Domains are therefore not merely type aliases.  Instead, types are in fact abstract domains, or should I say scalar types are abstract domains.

I think that might have been Codd's intention. I'm not sure I follow your argument, or that it matters a whole lot.

My only problem with TTM and the IM is its treatment of empty relations.  At least one of the arguments for explicitly including the heading is that the MST would need to be computed by reading billions of tuples.  But that's an implementation issue, not a matter of semantics.  It is redundant to require what can be determined from the contents to be explicitly specified, and the only values whose MST cannot be determined by their contents are those being or containing empty relation values.

Under the IM the MST of a new relational value can only be determined by inspecting each and every tuple, regardless. The heading has no influence on that. The trick is to avoid doing so unnecessarily. The MST will be needed for some comparisons but not all, and for some function calls but not all.

And no, the heading cannot be determined from the content, and the heading is required in order to infer the 'declared' type of a computed relational value.

The heading of non-empty relations can be determined from the content.  The process is actually spelled out in the prescriptions.

It's important to distinguish between values and expressions.  Consider this: Values have a most-specific type; expressions do not.  Expressions have a declared type; values do not--except for empty relation values.  Doesn't that strike you as odd?

The declared type of the result of a relational expression can be determined from the declared types of its operands.  There is no computational requirement for empty relations to have a declared type.  It's important to distinguish between values and expressions.  Consider this:

VAR X REL {A INT, B INT};  X := REL {A INT, B INT}{};

VAR Y REL {A EVEN, B EVEN}; Y := REL {A EVEN, B EVEN}{TUP{2, 4}};

VAR Z REL {A INT, B INT}; Z = X UNION Y;

What is the MST of the value in Z ?  Is it REL{A INT, B INT} or REL{A EVEN, B EVEN} ? Obviously, it is REL{A EVEN, B EVEN} because the only tuple in the result of the union has type TUP {A EVEN, B EVEN}.  The declared type of the expression X UNION Y is REL{A INT, B INT}.  It is computed from the declared types of its operands, but the MST of a value should not depend on the context at which it appears.

It's important to distinguish between values and expressions.  Is the expression REL {A INT, B INT}{} identical to the expression REL {A CHAR, B CHAR}{} ?   Obviously not, because the declared types of the expressions differ.  One cannot assign REL {A CHAR, B CHAR}{} to relvar X above.  But the MST of the result of evaluating the literal [expression] REL {A INT, B INT}{} is REL{A OMEGA, B OMEGA}, and the MST of the result of evaluating the literal REL {A CHAR, B CHAR}{} is also REL{A OMEGA, B OMEGA}.  Consider this:

Should it be valid to TREAT_AS_TYPE REL {A INT, B INT} the literal REL{A CHAR, B CHAR}{} ?

Why should it be any less valid to TREAT_AS_TYPE REL{A INT, B INT} the literal REL {X CHAR, Y CHAR, Z CHAR}{} ?  What is gained by having a separate empty relation value for each heading?

Brian

Quote from Dave Voorhis on October 26, 2019, 11:41 pm
Quote from Brian S on October 26, 2019, 4:19 pm

I'm all for going back to fundamentals, but I don't think we should throw out the baby with the bathwater.

Let's go back to the fundamentals.  Domains are what predicates range over.  Some domains are abstract, such as integers and character strings [...]

Domains are therefore not merely type aliases.  Instead, types are in fact abstract domains, or should I say scalar types are abstract domains.

"Abstract domains ..."

You keep using that phrase. I do not think it means what you think it means.

(With apologies to The Princess Bride.)

I am not a mathematician.  What the term "domain" means in the context of domain theory has absolutely no bearing on my usage here.  My usage here is that a domain is merely a collection of elements of the Universe of Discourse.  In classical logic there is only one domain--the Universe of Discourse--and every parameter of every predicate ranges over the entire Universe.  When the logic is typed, on the other hand, the parameters of predicates are typed, and range over collections of elements of the Universe which are not necessarily the entire Universe.

Now, there are two kinds of elements in the Universe: those that can have a location in time or in time and space, and those that cannot.  Those that can have a location in time are called "concrete."  Those that cannot are called "abstract."  Don't read anything else into it.  An abstract domain contains only abstract elements of the Universe.

Brian

Quote from Brian S on November 1, 2019, 3:19 am

Under the IM the MST of a new relational value can only be determined by inspecting each and every tuple, regardless. The heading has no influence on that. The trick is to avoid doing so unnecessarily. The MST will be needed for some comparisons but not all, and for some function calls but not all.

And no, the heading cannot be determined from the content, and the heading is required in order to infer the 'declared' type of a computed relational value.

The heading of non-empty relations can be determined from the content.  The process is actually spelled out in the prescriptions.

I'm finding your use of the terminology difficult, because it's not clear what exactly you're saying. What is a 'relation'? A variable, value, expression, argument, etc?

Heading is a syntactic construct used in the construction of tuple and relation types. You can infer a heading from a type, and every value belongs to one or more types, each of which does or does not have a heading. Is that what you mean?

It's important to distinguish between values and expressions.  Consider this: Values have a most-specific type; expressions do not.  Expressions have a declared type; values do not--except for empty relation values.  Doesn't that strike you as odd?

The compiler view: various constructs (variables, operators, literals) are declared to have some type. Somewhere in TIRT there is discussion about extending the concept of declared type to include expressions, because the compiler can always trace every expression back to the declared elements from which it was constructed. The compiler checks for type consistency of comparisons and operator invocations using the declared type, because that's what it knows. That isn't odd at all.

It follows that values never have a declared type, but the expressions that produce them do.

The runtime view: every value belongs to some type, or (given the IM) more than one. One of those will be its MST. The MST takes part in the specifics of comparing values and in dispatching to operators, based on the runtime value. The declared type is long gone -- only the compiler knew that.

The declared type of the result of a relational expression can be determined from the declared types of its operands.  There is no computational requirement for empty relations to have a declared type.  It's important to distinguish between values and expressions.  Consider this:

I have no idea what the middle (underlined) sentence means. If an empty relation value is the product of some expression, it will have the declared type of that expression (subject to the note above).

VAR X REL {A INT, B INT};  X := REL {A INT, B INT}{};

VAR Y REL {A EVEN, B EVEN}; Y := REL {A EVEN, B EVEN}{TUP{2, 4}};

VAR Z REL {A INT, B INT}; Z = X UNION Y;

What is the MST of the value in Z ?  Is it REL{A INT, B INT} or REL{A EVEN, B EVEN} ? Obviously, it is REL{A EVEN, B EVEN} because the only tuple in the result of the union has type TUP {A EVEN, B EVEN}.  The declared type of the expression X UNION Y is REL{A INT, B INT}.  It is computed from the declared types of its operands, but the MST of a value should not depend on the context at which it appears.

Z has the value { 4,4 }. The MST of the value 4 here is EVEN, so the MST of Z is { EVEN,EVEN }. This depends only on the value (within a specific type hierarchy), not on its context.

It's important to distinguish between values and expressions.  Is the expression REL {A INT, B INT}{} identical to the expression REL {A CHAR, B CHAR}{} ?   Obviously not, because the declared types of the expressions differ.  One cannot assign REL {A CHAR, B CHAR}{} to relvar X above.  But the MST of the result of evaluating the literal [expression] REL {A INT, B INT}{} is REL{A OMEGA, B OMEGA}, and the MST of the result of evaluating the literal REL {A CHAR, B CHAR}{} is also REL{A OMEGA, B OMEGA}.

The compiler will not permit this comparison, because the types are disjoint. The MST is irrelevant.

Consider this:

Should it be valid to TREAT_AS_TYPE REL {A INT, B INT} the literal REL{A CHAR, B CHAR}{} ?

Why should it be any less valid to TREAT_AS_TYPE REL{A INT, B INT} the literal REL {X CHAR, Y CHAR, Z CHAR}{} ?  What is gained by having a separate empty relation value for each heading?

Sorry, meaningless. Neither will get past the compiler.

 

Andl - A New Database Language - andl.org
Quote from Brian S on November 1, 2019, 3:55 am

Now, there are two kinds of elements in the Universe: those that can have a location in time or in time and space, and those that cannot.  Those that can have a location in time are called "concrete."  Those that cannot are called "abstract."  Don't read anything else into it.  An abstract domain contains only abstract elements of the Universe.

While I know what you mean, that definition isn't really well formed.  It relegates Sherlock Holmes and Frodo Baggins to the realm of abstracta, whereas they are plainly as concrete as you or me.  What's worse is that the physical universe itself (as distinct from the Universe of Discourse) winds up as abstract, as it has no location in time and space: one may say it is time and space.

PreviousPage 9 of 22Next