'Phantom type' for Attribute-Value pairs, in "popular languages" [was C# as a D]

#21 · June 23, 2020, 8:54 am

These last three responses are too combative. I'm out. You've convinced me I never want the IM (which I didn't anyway -- I've already bent too much to try to compromise with it); and I never want TTM's type system.

You haven't shown me you understand 'phantom types'. So I'll stop trying to use TTM terminology to explain them. Obviously it isn't working. Apparently TTM's type system is so far out of touch with PLT type theory as at when TTM first appeared [1998] there's just no hope of dragging it into the 1980's.

Quote from Erwin on June 23, 2020, 8:34 am

Quote from AntC on June 23, 2020, 6:29 am

Quote from Erwin on June 22, 2020, 12:20 pm

So that said, my second response is that you seem to be after a type system that "recognises" ("acknowledges") in some way the equality/"sameness" of the "individual" 1 in the (distinct) values INT(1) and, say, BIGINT(1).

No: Those have a different PhysRep. Not an example. Even less (contra DBE Ch 22) do I want to tangle up INT(1) with RAT(1). What I do want to do is say bare 1 as a literal appearing in program text is ambiguous as to what type denotes[**], so the source context/type inference must disambiguate.

[**] that ambiguity is entirely consistent with my revised wording for RM Pre 1 above.

"Different physrep" is not an answer and it shows you don't understand the nature of type systems as systems of [manipulation of] ***abstract*** values.

What the Physrep looks like ***should be left to the type designer/definer***. As a consequence, you cannot know what the physrep is, and as an ensuing consequence, you cannot possibly know that they are different. Some architectures represent the number one as 0x0001, others as 0x0100. And type systems are targeted at ***INSULATING*** users from such differences.

#22 · June 23, 2020, 8:55 am

Quote from AntC on June 23, 2020, 6:29 am

Quote from Erwin on June 22, 2020, 12:20 pm

***Within*** the IM, I believe there is a stated requirement that for types that are in a subtype/supertype relationship wrt one another, it ***must*** be a PROPER subtype/PROPER supertype relationship, with an acknowledgement that the system can never be supposed to verify that, so it's the type designer's responsibility.

Seems Rel doesn't comply, from Dave's experiment.

Rel is compliant to the extent that Tutorial D is compliant, because it's Tutorial D. The supertype/subtype requirement is for S-by-C, not UNION types, isn't it?

Some aspects of the model can only be dictated as policy, because no reasonable implementation can feasibly computationally guarantee them.

This example of subtyping-by-constraint is guaranteed not to comply:

TYPE Overlapper POSSREP {x INT};
TYPE BigUns IS {Overlapper 
  CONSTRAINT THE_x(Overlapper) > 10
  POSSREP {_x = THE_x(Overlapper)}
};
TYPE SmallUns IS {Overlapper 
  CONSTRAINT THE_x(Overlapper) < 20
  POSSREP {_x = THE_x(Overlapper)}
};

There are values of Overlapper -- Overlapper(11) through Overlapper(19) -- that would notionally be both BigUns and SmallUns, which the model disallows. But it's computationally infeasible to automatically identify this, so it must be left to the programmer. The results are undefined and thus implementation-dependent, but it's my fault; I broke the rules.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#23 · June 23, 2020, 8:59 am

Quote from AntC on June 23, 2020, 8:54 am

These last three responses are too combative. I'm out. You've convinced me I never want the IM (which I didn't anyway -- I've already bent too much to try to compromise with it); and I never want TTM's type system.

So you've said, numerous times. That's fine; you're not obligated to use it. I'm sure even the requirement to use the IM if you implement inheritance would be happily overlooked as long as your system is logical and consistent.

I think the IM is interesting to explore, even if other approaches may be more practically -- and theoretically -- advisable.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#24 · June 23, 2020, 1:58 pm

You haven't shown me you understand 'phantom types'. So I'll stop trying to use TTM terminology to explain them. Obviously it isn't working. Apparently TTM's type system is so far out of touch with PLT type theory as at when TTM first appeared [1998] there's just no hope of dragging it into the 1980's.

I'm not sure I understand phantom types at anything beyond a superficial level, and reading the paper wasn't that helpful. Can you explain what the term means without using any TTM (or Haskell) specific terminology? And why one should care?

Andl - A New Database Language - andl.org

#25 · June 24, 2020, 1:16 am

Quote from AntC on June 24, 2020, 1:16 am

Quote from dandl on June 23, 2020, 1:58 pm

You haven't shown me you understand 'phantom types'. So I'll stop trying to use TTM terminology to explain them. Obviously it isn't working. Apparently TTM's type system is so far out of touch with PLT type theory as at when TTM first appeared [1998] there's just no hope of dragging it into the 1980's.

I'm not sure I understand phantom types at anything beyond a superficial level, and reading the paper wasn't that helpful. Can you explain what the term means without using any TTM (or Haskell) specific terminology? And why one should care?

A typical use case (discussed in the Cheney & Hinze paper) is DSELs (Domain-Specific Embedded Languages, which is a possible approach to embedding D-ness in a "popular programming language"). The DSEL implies a type system, but that might cut across/be incompatible with the host language's type system. As the interpreter is parsing embedded language constructs, it builds up a typing for the embedded expressions, to make sure the expressions are type-correct as well as syntactically correct. So it wants to annotate the AST with the type at each node. But those are purely types needed at compile time; not the values/nodes in the AST.

Another example would be size-indexed collections: some operations on collections apply only for non-empty data structures (delete element, max/min); we'd rather statically guarantee those will succeed rather than throwing a runtime exception; OTOH it would be horribly clumsy to declare two different data structures for empty vs non-empty -- especially since typically a structure starts empty and gets elements inserted. So maintain a type-level tag of the element-count for the structure; operations that add elements (insert, append) must increment the count; operations that remove must decrement. To emphasise: this must be statically at compile time; not at runtime because that could only deliver a runtime exception.

File handlers/channels and state, especially files/channels that a program doesn't necessarily open, or only open according to runtime configuration, or only open if triggered by content of other files. Typically you'll get a runtime exception if you try to read/write a file that's not open, or close a file that's not open or open a file that's already open. Again we'd rather statically guarantee a program won't throw those sort of exceptions. So add a type-level (phantom) tag with the file state.

All these examples assume value-semantics, not pointer semantics: if you change the state of some data structure, that means change its (phantom) type, which means return a new structure with a different type, not reassign the new structure to an existing variable. So that's typically achieved via Continuation-Passing Style, or equivalent.

And that implies type-level despatch/overloading of operators: there's two overloadings for open-file (say), opening an already-open file (type) is a no-op, opening a non-open file properly opens it and returns a differently-typed file handle.

Quote from dandl on June 23, 2020, 1:58 pm

You haven't shown me you understand 'phantom types'. So I'll stop trying to use TTM terminology to explain them. Obviously it isn't working. Apparently TTM's type system is so far out of touch with PLT type theory as at when TTM first appeared [1998] there's just no hope of dragging it into the 1980's.

I'm not sure I understand phantom types at anything beyond a superficial level, and reading the paper wasn't that helpful. Can you explain what the term means without using any TTM (or Haskell) specific terminology? And why one should care?

A typical use case (discussed in the Cheney & Hinze paper) is DSELs (Domain-Specific Embedded Languages, which is a possible approach to embedding D-ness in a "popular programming language"). The DSEL implies a type system, but that might cut across/be incompatible with the host language's type system. As the interpreter is parsing embedded language constructs, it builds up a typing for the embedded expressions, to make sure the expressions are type-correct as well as syntactically correct. So it wants to annotate the AST with the type at each node. But those are purely types needed at compile time; not the values/nodes in the AST.

Another example would be size-indexed collections: some operations on collections apply only for non-empty data structures (delete element, max/min); we'd rather statically guarantee those will succeed rather than throwing a runtime exception; OTOH it would be horribly clumsy to declare two different data structures for empty vs non-empty -- especially since typically a structure starts empty and gets elements inserted. So maintain a type-level tag of the element-count for the structure; operations that add elements (insert, append) must increment the count; operations that remove must decrement. To emphasise: this must be statically at compile time; not at runtime because that could only deliver a runtime exception.

File handlers/channels and state, especially files/channels that a program doesn't necessarily open, or only open according to runtime configuration, or only open if triggered by content of other files. Typically you'll get a runtime exception if you try to read/write a file that's not open, or close a file that's not open or open a file that's already open. Again we'd rather statically guarantee a program won't throw those sort of exceptions. So add a type-level (phantom) tag with the file state.

All these examples assume value-semantics, not pointer semantics: if you change the state of some data structure, that means change its (phantom) type, which means return a new structure with a different type, not reassign the new structure to an existing variable. So that's typically achieved via Continuation-Passing Style, or equivalent.

And that implies type-level despatch/overloading of operators: there's two overloadings for open-file (say), opening an already-open file (type) is a no-op, opening a non-open file properly opens it and returns a differently-typed file handle.

#26 · June 24, 2020, 1:31 pm

Thanks, that's pretty helpful.

The first example does sound like what I've been on about: the DSEL sees headings as phantom types and tracks type inference across relational operations; the host language just sees relations and tuples as blind types to deal with generically.

The other two are harder. I get that a file operation always returns an object of a known type: open or closed, but a collection? I don't see how the type system can count.

Are there any working implementations?

Andl - A New Database Language - andl.org

#27 · June 25, 2020, 4:11 am

Quote from dandl on June 24, 2020, 1:31 pm

I don't see how the type system can count.

Are there any working implementations?

You use Peano-style arithmetic.

data ZNat -- a phantom type representing Peano Zero

data SNat p -- a phantom type representing the Successor to Peano p

type family AddNat p1 p2 where { AddNat ZNat p2 = p2; AddNat (SNat p) p2 = SNat (Add Nat p p2) }

A type family is a type-level function 'computed' at compile time by type inference. Similarly there's type familys for other arithmetic operations, and comparisons.

Implementations: type-level arithmetic has proved so effective, not to say ubiquitous, there's now built-in machinery in the GHC compiler; including recognising type 1 as a shorthand for (SNat ZNat), etc. (Note Haskell syntax always distinguishes between a program region using the type namespace vs the term/value namespace, so no confusion with type Int/Float etc.)

#28 · June 25, 2020, 7:23 am

Quote from AntC on June 25, 2020, 4:11 am

Quote from dandl on June 24, 2020, 1:31 pm

I don't see how the type system can count.

Are there any working implementations?

You use Peano-style arithmetic.

data ZNat -- a phantom type representing Peano Zero

data SNat p -- a phantom type representing the Successor to Peano p

type family AddNat p1 p2 where { AddNat ZNat p2 = p2; AddNat (SNat p) p2 = SNat (Add Nat p p2) }

A type family is a type-level function 'computed' at compile time by type inference. Similarly there's type familys for other arithmetic operations, and comparisons.

Implementations: type-level arithmetic has proved so effective, not to say ubiquitous, there's now built-in machinery in the GHC compiler; including recognising type 1 as a shorthand for (SNat ZNat), etc. (Note Haskell syntax always distinguishes between a program region using the type namespace vs the term/value namespace, so no confusion with type Int/Float etc.)

I guess that makes sense, if you know how big the collection will be at compile time. I don't see how it plays out in the general case.

Re DSELs, do you know of any examples (preferably not just in the Haskell world)?

Andl - A New Database Language - andl.org

The Forum for Discussion about The Third Manifesto and Related Matters