First and second class citizens in TTM

#61 · June 27, 2019, 11:37 am

Quote from AntC on June 27, 2019, 11:37 am
Quote from Dave Voorhis on June 27, 2019, 7:01 am
Quote from AntC on June 26, 2019, 10:59 pm

Quote from Hugh on June 26, 2019, 11:08 am

in its support for UDT definitions. So do your restricted TUPLE{...} expressions. How do you get off the ground, so to speak, without any system-defined scalar types?

...

Types "get of the ground " because type declarations introduce both the type name and names for all the member values. The full generality of type declarations allows building types from already-defined types (as with Tutorial D user-defined types), but you need to start with the boot laces.
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
is the (usual/library-supplied) declaration for type-name Bool (the definiens) with member values False, True (sequenced that way round so that False evaluates to less than True). The keyword data (which you've complained about before: why not type?) is because False, True are data values. (Other declarations declare for example type aliases, which don't mention data values.)

The | separator between the member values is introducing a 'sum type' aka 'tagged union', as in sum-of-product types. It is (theoretically) how all base types are defined
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
If the above is "(theoretically) how all base types are defined", how are base types like Int and Char actually defined?
I've described how the semantics of Haskell is presented to learners. I'm sure there is some simplification for tutorial purposes. And there has to be some eliding the truth to yank on the bootstraps. At least, that definition for Bool you can see verbatim in the language Prelude that is, the standard/usual library that comes with every compiler.

Is it compiler magic (i.e., baked into the compiler), or do definitions like the above for Int and Char live somewhere in the Haskell standard library (or equivalent)?

The language standard is clearly split into two parts: 1) the formal syntax and semantics; 2) the standard libraries (which include declarations for Bool, Char, Int, Float and the usually expected numerical operations, including comparisons yielding Bool). Every compiler is expected to support/implement the standard libraries, even if some program chooses not to import them (or not them all). Furthermore in practice every compiler takes advantage of seeing library-defined types to generate more efficient machine code.

The syntax for Int, Float, Char (and String), tuples (and a few other base types) is not valid for user-defined values (which must be simple names). So yes there is compiler magic to recognise that special syntax and turn it into internal representations of those values. But from then on, those values behave as if user-defined. A (non-parameterised/non-polymorphic) type is just a set of values, per RM Pre 1; and each possrep must belong to exactly one type, per RM Pre 2.

Either way, the answer to Hugh's question ("How do you get off the ground, so to speak, without any system-defined scalar types?") appears to be that numeric and character literals are special, baked into the compiler, and at least (if it's the latter case) predefined (by the compiler) to be notionally "typeful" to the extent that, for example, -9223372036854775808 | ... | -2 is recognised to represent a range of ordinal numeric values.

Or is that not how it works?

Two things: a) the range of type Int (in the standard library) is implementation-dependent. That's the range on my machine (64-bits). The language standard requires support for at least 31-bits. The range of type Integer (in the standard library) is arbitrary precision (to some IEEE standard/using a C library implementation). Similarly the Char encoding is implementation-dependent (expected to be at least UTF-8 compliant).

But b) no it's not 'baked in' , in the sense if you really, really want to build your own model of numeric types, the compiler will help you do that. (And that is a realistic use case: people use their own representations if they want more precision than Int or Float, Double but better efficiency than IEEE standards -- for example to do fancy array manipulations.) The key thing is that fromIntegral I mentioned (and mis-spelt); there's also a fromRational. Those convert from a secret/implementation-defined numeric format to your custom format. You must supply definitions/overloadings for those two functions, as well as declaring your numeric types. (And probably those declarations will be machine-level, so you're using some escape-hatch 'Foreign Function Interface' to get outside official Haskell.)

Token 1234 appearing in a program is syntactic sugar for fromIntegral 1234. So the token 1234 must appear with a type annotation or typeful context that gives the return type of the expression, in order for the compiler to resolve which overloading of method fromIntegral to apply. These are all valid sourcecode
x = 1234 :: Int   -- type annotation on the value, then 1234.0 not valid
y = 1234 :: Float -- could be written 1234.0
z :: Double       -- type annotation on the variable
z = 1234
w = sqrt z        -- type of w inferred from function sqrt applied to arg z
But I didn't say: in fromIntegral 1234, what's the type of that 1234. Again it's determined from the typeful context. Although method fromIntegral is polymorphic/overloaded in its return type, its argument type is compiler-determined (and a user library can't override that).

If the explanation above is still not technical enough, you've probably exhausted the depths of my understanding. (As an end-user, I'm happy to accept the tutorial explanation, whilst being aware it's something of a fairy-story.)

Quote from Dave Voorhis on June 27, 2019, 7:01 am
Quote from AntC on June 26, 2019, 10:59 pm

Quote from Hugh on June 26, 2019, 11:08 am

in its support for UDT definitions. So do your restricted TUPLE{...} expressions. How do you get off the ground, so to speak, without any system-defined scalar types?

...

Types "get of the ground " because type declarations introduce both the type name and names for all the member values. The full generality of type declarations allows building types from already-defined types (as with Tutorial D user-defined types), but you need to start with the boot laces.
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
data Bool = False | True;
is the (usual/library-supplied) declaration for type-name Bool (the definiens) with member values False, True (sequenced that way round so that False evaluates to less than True). The keyword data (which you've complained about before: why not type?) is because False, True are data values. (Other declarations declare for example type aliases, which don't mention data values.)

The | separator between the member values is introducing a 'sum type' aka 'tagged union', as in sum-of-product types. It is (theoretically) how all base types are defined
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
data Int = -9223372036854775808 | ... | -2 | -1 | 0 | 1 | 2 | ... | 9223372036854775807

data Char = '\NUL' | ... | ' ' | 'A' | 'B' | ... | 'a' | 'b' | ... '\1114111'
If the above is "(theoretically) how all base types are defined", how are base types like Int and Char actually defined?

I've described how the semantics of Haskell is presented to learners. I'm sure there is some simplification for tutorial purposes. And there has to be some eliding the truth to yank on the bootstraps. At least, that definition for Bool you can see verbatim in the language Prelude that is, the standard/usual library that comes with every compiler.

Is it compiler magic (i.e., baked into the compiler), or do definitions like the above for Int and Char live somewhere in the Haskell standard library (or equivalent)?

The language standard is clearly split into two parts: 1) the formal syntax and semantics; 2) the standard libraries (which include declarations for Bool, Char, Int, Float and the usually expected numerical operations, including comparisons yielding Bool). Every compiler is expected to support/implement the standard libraries, even if some program chooses not to import them (or not them all). Furthermore in practice every compiler takes advantage of seeing library-defined types to generate more efficient machine code.

The syntax for Int, Float, Char (and String), tuples (and a few other base types) is not valid for user-defined values (which must be simple names). So yes there is compiler magic to recognise that special syntax and turn it into internal representations of those values. But from then on, those values behave as if user-defined. A (non-parameterised/non-polymorphic) type is just a set of values, per RM Pre 1; and each possrep must belong to exactly one type, per RM Pre 2.

Either way, the answer to Hugh's question ("How do you get off the ground, so to speak, without any system-defined scalar types?") appears to be that numeric and character literals are special, baked into the compiler, and at least (if it's the latter case) predefined (by the compiler) to be notionally "typeful" to the extent that, for example, -9223372036854775808 | ... | -2 is recognised to represent a range of ordinal numeric values.

Or is that not how it works?

Two things: a) the range of type Int (in the standard library) is implementation-dependent. That's the range on my machine (64-bits). The language standard requires support for at least 31-bits. The range of type Integer (in the standard library) is arbitrary precision (to some IEEE standard/using a C library implementation). Similarly the Char encoding is implementation-dependent (expected to be at least UTF-8 compliant).

But b) no it's not 'baked in' , in the sense if you really, really want to build your own model of numeric types, the compiler will help you do that. (And that is a realistic use case: people use their own representations if they want more precision than Int or Float, Double but better efficiency than IEEE standards -- for example to do fancy array manipulations.) The key thing is that fromIntegral I mentioned (and mis-spelt); there's also a fromRational. Those convert from a secret/implementation-defined numeric format to your custom format. You must supply definitions/overloadings for those two functions, as well as declaring your numeric types. (And probably those declarations will be machine-level, so you're using some escape-hatch 'Foreign Function Interface' to get outside official Haskell.)

Token 1234 appearing in a program is syntactic sugar for fromIntegral 1234. So the token 1234 must appear with a type annotation or typeful context that gives the return type of the expression, in order for the compiler to resolve which overloading of method fromIntegral to apply. These are all valid sourcecode

x = 1234 :: Int   -- type annotation on the value, then 1234.0 not valid
y = 1234 :: Float -- could be written 1234.0
z :: Double       -- type annotation on the variable
z = 1234
w = sqrt z        -- type of w inferred from function sqrt applied to arg z

But I didn't say: in fromIntegral 1234, what's the type of that 1234. Again it's determined from the typeful context. Although method fromIntegral is polymorphic/overloaded in its return type, its argument type is compiler-determined (and a user library can't override that).

If the explanation above is still not technical enough, you've probably exhausted the depths of my understanding. (As an end-user, I'm happy to accept the tutorial explanation, whilst being aware it's something of a fairy-story.)

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters

First and second class citizens in TTM