The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

RFC: "DOMAIN" types in Tutorial D

Acceptable?
Yes, as is.
Yes, with changes. (See posts for details.)
No. (See posts for reasons.)
Other. (See posts for details.)
12

The discussion in https://forum.thethirdmanifesto.com/forum/topic/codd-1970-domain-does-not-mean-date-2016-type-was-burble-about-dates-im/ led me to suggest, somewhat in passing, syntax and semantics for DOMAIN types in Tutorial D. Now that I think about it, they might actually be a good idea, so I thought I'd copy the original idea here, particularly as it's no longer very theoretical and very TD-specific.

The proposal here is to add DOMAIN types to Tutorial D.

A DOMAIN type creates a new type which enforces nominal type safety, whilst retaining the semantics of a specifically-referenced other type. For example, assuming type INT already exists (i.e., is the Tutorial D built-in INTEGER type) and has the usual semantics, the declaration

   DOMAIN MyType INT

states that MyType provides the interface of an INT but is not type compatible with INT or any other DOMAIN <name> INT unless explicitly cast to an INT.

E.g., given:

DOMAIN MyType INT;
DOMAIN AnotherType INT;
VAR MyVar REAL RELATION {x INT, y MyType, z MyType, q AnotherType} KEY {x};

EXTEND MyVar: {p := x + y} would fail with a domain (type) mismatch error.

EXTEND MyVar: {p := x + CAST_AS_INT(y)} would be valid.

EXTEND MyVar: {p := y + z} would be valid.

EXTEND MyVar: {p := q + z} would fail with a domain (type) mismatch error.

This succinctly enforces type safety where it's desired, such as preventing accidental multiplication of an INTEGER Quantity by an employee's INTEGER Age (define Quantity to be of type DOMAIN Qty INT and Age to be DOMAIN Age INT) or preventing accidental addition of an employee's MONEY Salary with a product's MONEY Price, or preventing accidental JOIN of a product's CHAR Code with an employee's CHAR Code. At the same time, it avoids the overhead and attendant complexity/verbosity of defining a distinct type such TYPE MyType POSSREP {val INT}, or with the IM, TYPE MyType IS {INTEGER}.

The syntax is simply:

   DOMAIN <new type name> <existing scalar type name>

We assume DROP TYPE would work on DOMAIN types.

A definition of DOMAIN <new type name> <existing scalar type name> would (perhaps automatically) co-define an OPERATOR CAST_AS_<existing scalar type name>(<new type value>) RETURNS <existing scale type value> to explicitly cast a given DOMAIN value to its underlying type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 10:11 am

The discussion in https://forum.thethirdmanifesto.com/forum/topic/codd-1970-domain-does-not-mean-date-2016-type-was-burble-about-dates-im/ led me to suggest, somewhat in passing, syntax and semantics for DOMAIN types in Tutorial D. Now that I think about it, they might actually be a good idea, so I thought I'd copy the original idea here, particularly as it's no longer very theoretical and very TD-specific.

The proposal here is to add DOMAIN types to Tutorial D.

A DOMAIN type creates a new type which enforces nominal type safety, whilst retaining the semantics of a specifically-referenced other type. For example, assuming type INT already exists (i.e., is the Tutorial D built-in INTEGER type) and has the usual semantics, the declaration

DOMAIN MyType INT

states that MyType provides the interface of an INT but is not type compatible with INT or any other DOMAIN <name> INT unless explicitly cast to an INT.

E.g., given:

DOMAIN MyType INT;
DOMAIN AnotherType INT;
VAR MyVar REAL RELATION {x INT, y MyType, z MyType, q AnotherType} KEY {x};

EXTEND MyVar: {p := x + y} would fail with a domain (type) mismatch error.

EXTEND MyVar: {p := x + CAST_AS_INT(y)} would be valid.

EXTEND MyVar: {p := y + z} would be valid.

EXTEND MyVar: {p := q + z} would fail with a domain (type) mismatch error.

This succinctly enforces type safety where it's desired, such as preventing accidental multiplication of an INTEGER Quantity by an employee's INTEGER Age (define Quantity to be of type DOMAIN Qty INT and Age to be DOMAIN Age INT) or preventing accidental addition of an employee's MONEY Salary with a product's MONEY Price, or preventing accidental JOIN of a product's CHAR Code with an employee's CHAR Code. At the same time, it avoids the overhead and attendant complexity/verbosity of defining a distinct type such TYPE MyType POSSREP {val INT}, or with the IM, TYPE MyType IS {INTEGER}.

The syntax is simply:

DOMAIN <new type name> <existing scalar type name>

We assume DROP TYPE would work on DOMAIN types.

A definition of DOMAIN <new type name> <existing scalar type name> would (perhaps automatically) co-define an OPERATOR CAST_AS_<existing scalar type name>(<new type value>) RETURNS <existing scale type value> to explicitly cast a given DOMAIN value to its underlying type.

I think this is a nice idea (and I'm bouncing it Chris) but I'm afraid I don't want to add anything major to the published TD definition.  In any case, we would be very wary of using DOMAIN as a keyword anywhere in the language.  Can't we just add something to TYPE, such as TYPE MyType AS INT?  That avoids having to provide DROP DOMAIN (or teach people that they can use DROP TYPE for that purpose).

I'm reminded of the SQL standard's "distinct type" construct, which serves a similar purpose, though I wouldn't try to emulate it exactly.

I'd like more information concerning casting, comparisons, and assignments, given your "nominal type safety" and "retain the semantics".

Would you have:

CAST_AS_MyType(i), where i is of type INT?
CAST_AS_MyType(ot), where ot is of type OtherType (also based on INT)?
MT := i, where MT is of type MyType and i is of type INT?
MT = i, where MT is of type MyType and i is of type INT?
MT < i, etc., where MT is of type MyType and i is of type INT?

and, using your proposed syntax rather than mine:

DOMAIN ExtraType MyType?
DOMAIN ExtraType POINT, where POINT is a scalar type defined with a possrep?
Types defined as subtypes of domains?
And are domains first-class types?

Hugh

 

Coauthor of The Third Manifesto and related books.
Quote from Hugh on October 24, 2019, 1:05 pm
Quote from Dave Voorhis on October 24, 2019, 10:11 am

The discussion in https://forum.thethirdmanifesto.com/forum/topic/codd-1970-domain-does-not-mean-date-2016-type-was-burble-about-dates-im/ led me to suggest, somewhat in passing, syntax and semantics for DOMAIN types in Tutorial D. Now that I think about it, they might actually be a good idea, so I thought I'd copy the original idea here, particularly as it's no longer very theoretical and very TD-specific.

The proposal here is to add DOMAIN types to Tutorial D.

A DOMAIN type creates a new type which enforces nominal type safety, whilst retaining the semantics of a specifically-referenced other type. For example, assuming type INT already exists (i.e., is the Tutorial D built-in INTEGER type) and has the usual semantics, the declaration

DOMAIN MyType INT

states that MyType provides the interface of an INT but is not type compatible with INT or any other DOMAIN <name> INT unless explicitly cast to an INT.

E.g., given:

DOMAIN MyType INT;
DOMAIN AnotherType INT;
VAR MyVar REAL RELATION {x INT, y MyType, z MyType, q AnotherType} KEY {x};

EXTEND MyVar: {p := x + y} would fail with a domain (type) mismatch error.

EXTEND MyVar: {p := x + CAST_AS_INT(y)} would be valid.

EXTEND MyVar: {p := y + z} would be valid.

EXTEND MyVar: {p := q + z} would fail with a domain (type) mismatch error.

This succinctly enforces type safety where it's desired, such as preventing accidental multiplication of an INTEGER Quantity by an employee's INTEGER Age (define Quantity to be of type DOMAIN Qty INT and Age to be DOMAIN Age INT) or preventing accidental addition of an employee's MONEY Salary with a product's MONEY Price, or preventing accidental JOIN of a product's CHAR Code with an employee's CHAR Code. At the same time, it avoids the overhead and attendant complexity/verbosity of defining a distinct type such TYPE MyType POSSREP {val INT}, or with the IM, TYPE MyType IS {INTEGER}.

The syntax is simply:

DOMAIN <new type name> <existing scalar type name>

We assume DROP TYPE would work on DOMAIN types.

A definition of DOMAIN <new type name> <existing scalar type name> would (perhaps automatically) co-define an OPERATOR CAST_AS_<existing scalar type name>(<new type value>) RETURNS <existing scale type value> to explicitly cast a given DOMAIN value to its underlying type.

I think this is a nice idea (and I'm bouncing it Chris) but I'm afraid I don't want to add anything major to the published TD definition.  In any case, we would be very wary of using DOMAIN as a keyword anywhere in the language.  Can't we just add something to TYPE, such as TYPE MyType AS INT?  That avoids having to provide DROP DOMAIN (or teach people that they can use DROP TYPE for that purpose).

I'm reminded of the SQL standard's "distinct type" construct, which serves a similar purpose, though I wouldn't try to emulate it exactly.

I'd like more information concerning casting, comparisons, and assignments, given your "nominal type safety" and "retain the semantics".

Would you have:

CAST_AS_MyType(i), where i is of type INT?
CAST_AS_MyType(ot), where ot is of type OtherType (also based on INT)?
MT := i, where MT is of type MyType and i is of type INT?
MT = i, where MT is of type MyType and i is of type INT?
MT < i, etc., where MT is of type MyType and i is of type INT?

and, using your proposed syntax rather than mine:

DOMAIN ExtraType MyType?
DOMAIN ExtraType POINT, where POINT is a scalar type defined with a possrep?
Types defined as subtypes of domains?
And are domains first-class types?

Hugh

 

I only suggest DOMAIN because of the discussion that led to this post -- the one at https://forum.thethirdmanifesto.com/forum/topic/codd-1970-domain-does-not-mean-date-2016-type-was-burble-about-dates-im/ -- but I'm certainly not attached to it.  I'd be equally happy with TYPE MyType AS INT, particularly as it's really a collection of shorthands for TYPE MyType POSSREP {v INT} and associated references and mechanisms.

In answer to your questions:

  1. CAST_AS_MyType(i) where i is INT would not be allowed. It's redundant per item 3 below. I.e., use the selector.
  2. CAST_AS_MyType(ot), where ot is any type OtherType whether based on INT or not, would not be allowed.
  3. MT := i, where MT is of type MyType and i is of type INT would not be allowed. Instead, use the selector MyType(i) where i is of type INT. E.g., MT := MyType(3) is valid.
  4. MT = i, MT < i, etc., where MT is of type MyType and i is of type INT would not be allowed.
  5. DOMAIN ExtraType MyType would be allowed.
  6. DOMAIN ExtraType POINT would be allowed.
  7. Types defined as subtypes of domains are allowed, because domains are types.
  8. Domains are precisely the same class as TYPEs, because domains are types.

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 1:25 pm

[...] particularly as it's really a collection of shorthands for TYPE MyType POSSREP {v INT} and associated references and mechanisms.

That's what concerns me: it's just syntactic sugar (which as is well known causes cancer of the semicolon).   You can declare such domains just as above, and cast them with THE_V.  What is gained except a few less characters in the declaration and a few more at the point of use?

Quote from johnwcowan on October 24, 2019, 1:47 pm
Quote from Dave Voorhis on October 24, 2019, 1:25 pm

[...] particularly as it's really a collection of shorthands for TYPE MyType POSSREP {v INT} and associated references and mechanisms.

That's what concerns me: it's just syntactic sugar (which as is well known causes cancer of the semicolon).   You can declare such domains just as above, and cast them with THE_V.  What is gained except a few less characters in the declaration and a few more at the point of use?

You get a few less characters in the declaration and a few less at the point of use, as long as you use the same domain -- which, of course, is the intent.

E.g., given DOMAIN MyType INT you can do this:

DOMAIN MyType INT;

VAR x INIT(MyType(3));
VAR y INIT(MyType(4));
VAR z INIT(MyType(2));

WRITELN x + y * z;

The usual alternative is somewhat more awkward and verbose:

TYPE MyType POSSREP {v INT}; 

VAR x INIT(MyType(3));
VAR y INIT(MyType(4)); 
VAR z INIT(MyType(2)); 

WRITELN THE_v(x) + THE_v(y) * THE_v(z); 

 

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Just noticed a potential issue. Assuming the MyType DOMAIN mentioned above, with VARs x, y, and z all of MyType, you can do WRITELN x + y * z as noted above. If you create an operator NewOp (INT r) RETURNS INT then you can, of course, do this: WRITELN NewOp(x + y * z). But you're forced, per the rules so far, to do this: WRITELN x + MyType(y) * z;

I'm starting to think any potential benefits may be outweighed by confusions...

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 2:40 pm

Just noticed a potential issue. Assuming the MyType DOMAIN mentioned above, with VARs x, y, and z all of MyType, you can do WRITELN x + y * z as noted above. If you create an operator NewOp (INT r) RETURNS INT then you can, of course, do this: WRITELN NewOp(x + y * z). But you're forced, per the rules so far, to do this: WRITELN x + MyType(y) * z;

I'm starting to think any potential benefits may be outweighed by confusions...

The problem is in the poor (too vague, too superficial, insufficiently fleshed out) definition of what it means to "expose the same interface".

Let's take as an example integer addition, which I presume is regarded as "part of the interface".  As an operator, it has a signature like +(INT;INT) and return type INT.  Name or invocation symbol ; type specs of the arguments ; return type.

What would you do with this particular operator when making MyType "inherit" that ?  "Obviously", you're going to turn that into an operator with signature like +(MYTYPE;MYTYPE) and return type MYTYPE.  Change every INT you find.  That's natural and obvious, innit ?  Well, no.  What logical grounds do you have for changing all three ?  What logical grounds do you have for not keeping, say, return type as is and turn it into +(MYTYPE;MYTYPE) and return type INT.  Tip : you're doing that because you are looking at a specific example where that "looks like" the sensible choice and that in turn is because you probably subconsciously picked that example because the scenario matches the desired results ***for that example case***.

As a second example, take the operator with signature SUBSTR(CHAR;INT;INT) and return type CHAR.  Hugh often used this example and claiming it "existed for type INT" to stress the notion that operators are not "bundled with" types.  So that being the case, what grounds do you have for deciding that this particular operator is (or is not) "part of the interface (of type INT)" ?  And if you decide that it is, does automatically creating an operator SUBSTR(CHAR;WEIGHT;WEIGHT) and return type CHAR look like it's achieving your goals of type safety ?

As a last one, consider EXP(FLOAT) with return type FLOAT.  There cannot be any doubt this operator is part of "the interface of" FLOAT.  But does automatically creating an operator EXP(LENGTH) with return type LENGTH look like achieving your goals of type safety ?

 

Quote from Dave Voorhis on October 24, 2019, 2:40 pm

Just noticed a potential issue. Assuming the MyType DOMAIN mentioned above, with VARs x, y, and z all of MyType, you can do WRITELN x + y * z as noted above. If you create an operator NewOp (INT r) RETURNS INT then you can, of course, do this: WRITELN NewOp(x + y * z). But you're forced, per the rules so far, to do this: WRITELN x + MyType(y) * z;

I'm starting to think any potential benefits may be outweighed by confusions...

My sketch earlier in that previous thread

DOMAIN S# RENAMES INT DERIVES {Eq, Ord};

was aimed at using INTs as identifiers (which is common); and in that case you definitely don't want to support arithmetic. Even using INTs as lengths/weights, your proposal avoids supporting adding a length to a weight OK, but doesn't avoid multiplying two weights. Whereas you might well want to multiply a QTY by a weight.

Perhaps a simpler rule would be: if you want to do arith on MyType, you must CAST_AS_INT( ) on all the values; do the arith as INT; cast back afterwards? That pushes the onus on to the programmer to think about their units. But then MyType is not a first-class type.

Addit: the rule could be: domain-types must support equality comparison (because all types must, per RM Pre 8); assignment (RM Pre 21); other comparison operators (or not) as per their based-on type (RM Pre 22). Stop.

Otherwise I agree with Erwin's comment that TTM/Tutorial D doesn't have a thoroughgoing theory/mechanism for operator overloading, so trying to graft one on is going to be ad-hoc and clunky.

Quote from Erwin on October 24, 2019, 8:33 pm
Quote from Dave Voorhis on October 24, 2019, 2:40 pm

Just noticed a potential issue. Assuming the MyType DOMAIN mentioned above, with VARs x, y, and z all of MyType, you can do WRITELN x + y * z as noted above. If you create an operator NewOp (INT r) RETURNS INT then you can, of course, do this: WRITELN NewOp(x + y * z). But you're forced, per the rules so far, to do this: WRITELN x + MyType(y) * z;

I'm starting to think any potential benefits may be outweighed by confusions...

The problem is in the poor (too vague, too superficial, insufficiently fleshed out) definition of what it means to "expose the same interface".

In a language with multiple dispatch, a type's interface notionally consists of all operator that have a parameter of that type -- and, yes, it grows as operators referencing the type are created -- but it is admittedly a poor choice of terminology and not particularly helpful.

Quote from Erwin on October 24, 2019, 8:33 pm
Quote from Dave Voorhis on October 24, 2019, 2:40 pm

Just noticed a potential issue. Assuming the MyType DOMAIN mentioned above, with VARs x, y, and z all of MyType, you can do WRITELN x + y * z as noted above. If you create an operator NewOp (INT r) RETURNS INT then you can, of course, do this: WRITELN NewOp(x + y * z). But you're forced, per the rules so far, to do this: WRITELN x + MyType(y) * z;

I'm starting to think any potential benefits may be outweighed by confusions...

The problem is in the poor (too vague, too superficial, insufficiently fleshed out) definition of what it means to "expose the same interface".

Let's take as an example integer addition, which I presume is regarded as "part of the interface".  As an operator, it has a signature like +(INT;INT) and return type INT.  Name or invocation symbol ; type specs of the arguments ; return type.

What would you do with this particular operator when making MyType "inherit" that ?  "Obviously", you're going to turn that into an operator with signature like +(MYTYPE;MYTYPE) and return type MYTYPE.  Change every INT you find.  That's natural and obvious, innit ?  Well, no.  What logical grounds do you have for changing all three ?  What logical grounds do you have for not keeping, say, return type as is and turn it into +(MYTYPE;MYTYPE) and return type INT.  Tip : you're doing that because you are looking at a specific example where that "looks like" the sensible choice and that in turn is because you probably subconsciously picked that example because the scenario matches the desired results ***for that example case***.

As a second example, take the operator with signature SUBSTR(CHAR;INT;INT) and return type CHAR.  Hugh often used this example and claiming it "existed for type INT" to stress the notion that operators are not "bundled with" types.  So that being the case, what grounds do you have for deciding that this particular operator is (or is not) "part of the interface (of type INT)" ?  And if you decide that it is, does automatically creating an operator SUBSTR(CHAR;WEIGHT;WEIGHT) and return type CHAR look like it's achieving your goals of type safety ?

As a last one, consider EXP(FLOAT) with return type FLOAT.  There cannot be any doubt this operator is part of "the interface of" FLOAT.  But does automatically creating an operator EXP(LENGTH) with return type LENGTH look like achieving your goals of type safety ?

Yes, this looks like another form of benefit-outweighing confusion introduced by my DOMAIN -- the fix ostensibly being some identification of operators that accept INT as a parameter being also allowed by a given DOMAIN -- but that's more complexity, and undesirably giving privilege ("bundling") some operators and not others.

This all makes apparent two things:

  1. A DOMAIN in whatever guise is just a type. Here the attempt is half type alias and half nominal typing mechanism. If it were no more than a type alias, it might be harmless, but it wouldn't be useful either. As a nominal typing mechanism, it doesn't really work.
  2. Everything a DOMAIN can do can be better handled, at least notionally, by those mechanisms that already exist, i.e., TYPE <name> POSSREP {v INT} or whatever, and if the IM is in effect, and where appropriate, TYPE <name> {IS INT} etc. If there are verbosity issues, there are better ways to fix them.

I hereby consign DOMAIN to the bin.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on October 24, 2019, 10:42 pm

 

In a language with multiple dispatch, a type's interface notionally consists of all operator that have a parameter of that type -- and, yes, it grows as operators referencing the type are created -- but it is admittedly a poor choice of terminology and not particularly helpful.

In multiple-dispatch languages, generic functions are not in any sense part of types.

 

  1. A DOMAIN in whatever guise is just a type. Here the attempt is half type alias and half nominal typing mechanism. If it were no more than a type alias, it might be harmless, but it wouldn't be useful either. As a nominal typing mechanism, it doesn't really work.
  2. Everything a DOMAIN can do can be better handled, at least notionally, by those mechanisms that already exist, i.e., TYPE <name> POSSREP {v INT} or whatever, and if the IM is in effect, and where appropriate, TYPE <name> {IS INT} etc. If there are verbosity issues, there are better ways to fix them.

Indeed.  What does need fixing is static type overloading of existing operators.  It should be possible for users to write operators that allow multiplying a length by a width, or a length by an int or float, and still use the * sign.  Syntactically, you could write op*, as Algol 68 does, or __mul__ as Python does, or just allow a symbolic operator to appear in the place of a <user op name>.

In addition, DROP OPERATOR should allow an optional signature so as to only drop a single overload.  Obviously DROP OPERATOR * is undesirable.

12