The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Codd 1970 'domain' does not mean Date 2016 'type' [was: burble about Date's IM]

PreviousPage 7 of 22Next

With the pi-numbers, π and 1 could both be exact or inexact, but π + 1 is unavoidably inexact.

Quote from johnwcowan on October 26, 2019, 1:06 am
Quote from AntC on October 25, 2019, 10:25 pm

So previous debates have raged about cartesian vs polar coordinates: for polar, what should be the angle if the displacement is zero?

[...]  However, Pure (which is not a Scheme or even a Lisp) maintains separate internal representations for rectangular and polar complex numbers.  The discernible difference is that if you ask for the real and imaginary parts of a rectangular number or the magnitude and angle of a polar number, you get back the value you put in, whereas in the two contrary cases you might not.

That's what I did with multiple possreps in Rel -- including a Rel-specific extension to define each possrep in terms of the others to avoid the need to define each selector individually -- and it's certainly workable as long as it's understood that issues with numerical inaccuracy can complicate determining equality and "round tripping" might not work. I don't see this as being a show-stopper -- particularly as it's unavoidable in general and not specific to multiple possreps -- but maybe others do. Unsurprisingly, the subject has been a source of much debate over the years.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 25, 2019, 1:31 pm

Ah, that's exactly where we differ. I'm trying to dispense with the function API to access query results. I did that with the Rel client library, and whilst it's workable, it's awkward -- as DBMS-to-client API's usually are. That's why I'm re-examining tuple-as-class-instance, but only as a way to represent database tuples on the client-side and not as a general database abstraction. The "first great blunder" is not being re-blundered; it's purely a means to make client-side database-driven application coding tastier and less chewy.

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited. Had they succeeded they would have entrenched another non-portable language in the database. One is bad enough.

What if D&D had rejected the notion, avoided trying to design a new type system and a new application development language and simply gone back to the fundamentals? I suggest:

  1. Just a Data Sub-language, based closely on the RA/RC (Codd's 1972 RA would work just fine)
  2. Anything can be a Host Language (portability in our time)
  3. Build on the work of others (JSON for the DB type system? Or even SQL?)
  4. Code lives in the repo, not in the database
  5. Keep it simple, stupid!

Domains are just a way to work with a really simple type system in the database. At the end you get a familiar thing, a kind of RA-programmable ORM with no visible SQL. It's cross platform, cross language, works locally as well as on the server, it can even work on a NoSQL back end.

The devil of course is in the usual place. Maybe it's not possible, but it's fun to think about.

Andl - A New Database Language - andl.org
Quote from dandl on October 26, 2019, 9:24 am
Quote from Dave Voorhis on October 25, 2019, 1:31 pm

Ah, that's exactly where we differ. I'm trying to dispense with the function API to access query results. I did that with the Rel client library, and whilst it's workable, it's awkward -- as DBMS-to-client API's usually are. That's why I'm re-examining tuple-as-class-instance, but only as a way to represent database tuples on the client-side and not as a general database abstraction. The "first great blunder" is not being re-blundered; it's purely a means to make client-side database-driven application coding tastier and less chewy.

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited. Had they succeeded they would have entrenched another non-portable language in the database. One is bad enough.

What if D&D had rejected the notion, avoided trying to design a new type system and a new application development language and simply gone back to the fundamentals? I suggest:

  1. Just a Data Sub-language, based closely on the RA/RC (Codd's 1972 RA would work just fine)
  2. Anything can be a Host Language (portability in our time)
  3. Build on the work of others (JSON for the DB type system? Or even SQL?)
  4. Code lives in the repo, not in the database
  5. Keep it simple, stupid!

Domains are just a way to work with a really simple type system in the database. At the end you get a familiar thing, a kind of RA-programmable ORM with no visible SQL. It's cross platform, cross language, works locally as well as on the server, it can even work on a NoSQL back end.

The devil of course is in the usual place. Maybe it's not possible, but it's fun to think about.

It sounds do-able, but for the typical developer who knows SQL, what benefit does it give over SQL and the usual JDBC/ODBC/etc. client-side libraries?

I'm not sure "domains" as you've hinted at here are anything but a type alias, or the usual ORM-like mapping between the client-side and server/DBMS-side types. Or are you using the term "domain" for precisely that?

Given the IM is optional and without it the Tutorial D type system is simple, I think it can be argued that Tutorial D sans IM is that.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I'm all for going back to fundamentals, but I don't think we should throw out the baby with the bathwater.

Let's go back to the fundamentals.  Domains are what predicates range over.  Some domains are abstract, such as integers and character strings, but one cannot store people in the database, or anything else that may be located in time or in time and space.  One can, however, assign to concrete objects abstract values which can be stored in the database, or one can store in the database combinations of abstract values that describe concrete objects.

There has been talk ad nauseam of internal predicates and external predicates on this forum, but the distinction is faulty, or the term predicate is being misused.  Also, there is not just a single predicate associated with each relation variable, but rather a family of predicates--even if the schema is fully normalized both vertically and horizontally.  More importantly, each predicate in the family of a fully normalized schema ranges not necessarily over the abstract domains identified by the attribute types, but rather the concrete domains from which the concrete objects described by each key value are drawn.  For instance,

SHIFT {SHIFT#, START, STOP,  ...}, KEY {SHIFT#}

EMPLOYEE {EMP#, NAME, SHIFT#, ...}, KEY {EMP#}

EMPLOYEE[SHIFT#] ⊆ SHIFT[SHIFT#]

One predicate in the family for EMPLOYEE relates the concrete objects identified by EMP# to the concrete objects identified by SHIFT#.  That predicate is binary and ranges over two concrete domains.  There are descriptions with predicates that are instantiated by values for EMP# and SHIFT# respectively,

The shift that is identified by SHIFT#

The employee that is identified by EMP#

But the underlying predicate doesn't relate EMP# to SHIFT#, but rather elements of the concrete domain of employees to elements of the concrete domain of shifts.

So there isn't an internal predicate that is distinct from an external predicate, but instead there are concrete domains, descriptions which when instantiated denote elements of those concrete domains, and predicates that range over those concrete domains and zero or more abstract domains (types).  All predicates must range over at least one concrete domain because predicates which range only over abstract domains have extensions that cannot change.  That works fine for relation constants, but not for relation variables.

In a fully normalized schema, the only sets of attributes whose values instantiate descriptions are candidate keys and foreign keys.  How do I know this?  Because redundancy is characterized by the possibility of the same fact being recorded more than once in the database.  If a set of attributes whose values instantiate descriptions were neither a key or a foreign key, then it would be possible for there to be multiple tuples with the same subset of attribute values in the same relation, and thus the same fact would be expressed by more than one tuple.

So by all means let's get back to fundamentals.  The Universe of Discourse includes not only the abstract elements of types but also objects which may be located in time or in time and space.  A domain is a collection of objects that are share the same properties.  The domain of employees, for example is the collection of objects that satisfies the predicate "being an employee" at at least one possible world.

Domains are therefore not merely type aliases.  Instead, types are in fact abstract domains, or should I say scalar types are abstract domains.

My only problem with TTM and the IM is its treatment of empty relations.  At least one of the arguments for explicitly including the heading is that the MST would need to be computed by reading billions of tuples.  But that's an implementation issue, not a matter of semantics.  It is redundant to require what can be determined from the contents to be explicitly specified, and the only values whose MST cannot be determined by their contents are those being or containing empty relation values.

Brian

Quote from dandl on October 25, 2019, 6:27 am

That's the one I want to avoid. That's very much part of the ORM thing, and I don't see why you need it. The results of a query are a relation. A relation is a set of tuple. A tuple is an object with a set of named properties, getters and setters. Who cares what type it is?

That's precisely the dynamic-typing perspective.  A type is a set of values, and they can be manipulated with a common set of operators.  "If it walks like a duck and quacks like a duck, it is a duck", hence the term duck typing.

Absolutely. That's the whole reason for embedding a DSL in some well-known HL. Except of course, it's better if Java is not the only choice.

Are you speaking about Java in particular, or are you saying it's better not to be tied to a single language whatever it may be?

Quote from dandl on October 26, 2019, 9:24 am

OK, but bear with me. D&D chose a particular direction in the midst of the process of formalising SQL, and they chose to define a "better SQL", complete with imposing a highly novel type system on the database. Note: the database has to know about types and a whole bunch of executable code, so it really is SQL revisited.

A database might have only one type, say strings or bytevectors, but I don't see how it can have none at all.  In any case, the RM prescriptions require only one scalar type, boolean.

Build on the work of others (JSON for the DB type system? Or even SQL?)

JSON's types are boolean, inexact rational number, string, array, tuple (and of course null, which we can ignore).  In practice I think an exact-number type is also desirable, or as a bare minimum an exact-integer type.  In short, just what Tutorial D prescribes, though it is unclear whether RATIONAL means exact or inexact numbers.

As for SQL's scalar type system, it is unparalleled for jugheadedness by anything except the XML Schema scalar type system: read it and weep. In summary:

  • The 18 orthogonal primitive types of XML Schema are string, boolean, decimal, float, double, duration, dateTime, time, date, gYearMonth ("g" stands for "Gregorian"), gYear, gMonthDay, gMonth, gDay, hexBinary, base64Binary, anyURI, QName (XML-specific), and NOTATION (XML-specific and deeply obscure).
  • The 12 standard orthogonal subtypes of string are normalizedString, token, language tag, and the XML-specific types NMTOKEN, NMTOKENS, Name, NCName, ID, IDREF, IDREFS, ENTITY, and ENTITIES.
  • The 13 standard orthogonal subtypes of decimalare integer, positiveInteger, nonPositiveInteger, negativeInteger, nonNegativeInteger, long, int, short, byte, unsignedLong, unsignedInt, unsignedShort, and unsignedByte.
  • These types can be constrained by the 12 standard facets (not all of which are applicable to all types) to create even more subtypes: length, minLength, maxLength, regular-expression pattern, enumeration (a fixed list of values), maxInclusive, maxExclusive, minInclusive, maxExclusive, totalDigits, and fractionDigits.
  • There are also many rules saying e.g. that a string type with a maxLength of 5 is a subtype of string with a maxLength of 10.

"Do you follow me?"
"I don't follow you."
"No matter.  To continue."
—Aristophanes, The Clouds

"Do you follow me?"
"Not a word, thank the Stars. If I even attempted to, I should bark like a dog for sheer pain of the intellect."
"Hmp.  Well, in any case [...]"
—Asimov, Pebble in the Sky

Quote from Brian S on October 26, 2019, 4:19 pm

Domains are therefore not merely type aliases.  Instead, types are in fact abstract domains, or should I say scalar types are abstract domains.

I think that might have been Codd's intention. I'm not sure I follow your argument, or that it matters a whole lot.

My only problem with TTM and the IM is its treatment of empty relations.  At least one of the arguments for explicitly including the heading is that the MST would need to be computed by reading billions of tuples.  But that's an implementation issue, not a matter of semantics.  It is redundant to require what can be determined from the contents to be explicitly specified, and the only values whose MST cannot be determined by their contents are those being or containing empty relation values.

Under the IM the MST of a new relational value can only be determined by inspecting each and every tuple, regardless. The heading has no influence on that. The trick is to avoid doing so unnecessarily. The MST will be needed for some comparisons but not all, and for some function calls but not all.

And no, the heading cannot be determined from the content, and the heading is required in order to infer the 'declared' type of a computed relational value.

Brian

 

Andl - A New Database Language - andl.org
Quote from Brian S on October 26, 2019, 4:19 pm

I'm all for going back to fundamentals, but I don't think we should throw out the baby with the bathwater.

Let's go back to the fundamentals.  Domains are what predicates range over.  Some domains are abstract, such as integers and character strings [...]

Domains are therefore not merely type aliases.  Instead, types are in fact abstract domains, or should I say scalar types are abstract domains.

"Abstract domains ..."

You keep using that phrase. I do not think it means what you think it means.

(With apologies to The Princess Bride.)

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from johnwcowan on October 26, 2019, 5:46 pm
Quote from dandl on October 25, 2019, 6:27 am

That's the one I want to avoid. That's very much part of the ORM thing, and I don't see why you need it. The results of a query are a relation. A relation is a set of tuple. A tuple is an object with a set of named properties, getters and setters. Who cares what type it is?

That's precisely the dynamic-typing perspective.  A type is a set of values, and they can be manipulated with a common set of operators.  "If it walks like a duck and quacks like a duck, it is a duck", hence the term duck typing.

Absolutely. That's the whole reason for embedding a DSL in some well-known HL. Except of course, it's better if Java is not the only choice.

Are you speaking about Java in particular, or are you saying it's better not to be tied to a single language whatever it may be?

I was responding to Dave, who seems to have a solution rather closely tied to Java. Yes, I mean it's better if the HL can be any suitably powerful language, and not tied closely to any one.

Andl - A New Database Language - andl.org
PreviousPage 7 of 22Next