Which type?

#61 · November 14, 2021, 11:47 am

Quote from Dave Voorhis on November 14, 2021, 11:47 am

Quote from dandl on November 14, 2021, 3:54 am

Rather than (probably pointlessly) debate individual points, I'll point out that I am at least somewhat sympathetic to your approach. Though I would expect which types are available by default to be a product implementation concern only, as there isn't any theoretical basis for it. As a theoretical framework for applied implementations, TTM takes exactly the right approach in not prescribing specific types -- beyond the unavoidable boolean -- but it does appropriately prescribe how (Date & Darwen believe) types should be built.

I would expect that (or similar) ability to define any and all types in any post-SQL DBMS, completely independent of whatever types a given product implementation may or may not provide on boot-up.

In short: Provide whatever built-in types you feel are appropriate in your implementation, but they don't belong in a conceptual framework like TTM, and a reasonable product should neither compel their use or limit the user to only that set as primitives or baseline types.

Just humour me for a moment: forget about programming languages, forget about application building, forget about the motivation of TTM and whether it is still relevant. Just think about the data.

Assume you wanted to foster a large community of data producers and consumers, all using different toolsets, and you were trying to devise a type system to facilitate data publication, interchange and reuse. It has to be easy to work with in any language, and it has to have really wide coverage, to reduce the temptation to add private types.

The lingua franca at present seems to be a choice between CSV files, Excel spreadsheets and JSON.

CSV is horrible and supports text string only; data snooping can give you boolean, text, integer, decimal, real, date/time, enum. Mistakes are really common.

Excel supports text string, real; via formats it supports integer, decimal and date/time; data snooping can give you boolean, enum.

JSON supports boolean, text string, real, and struct; data snooping can give you integer, decimal, datetime, enum.

Why would you not prefer a standard for data models and interchange based on my nine types? Currently there is no standard way to model or exchange binary strings, and many of the commonly used methods depend on data snooping.

CSV files and Excel spreadsheets are what end users use for some ad-hoc, non-programmatic interchange. For that purpose, they're fine.

For programmatic implementation of data interchange, there are numerous formats, including but not limited to various JSON formats, various XML formats, various protocols, various RPC implementations, message queues, and high-level data exchange protocols like HTTP, SMTP, FTP, AMQP and so on.

I would not prefer a standard for data models and interchange based on your nine types. They are unreasonably domain-specific. At least canonical integer, float, boolean and string map to low-level hardware types for maximum performance on the hardware-facing side. Everything else -- and particularly on the data-exchange side -- should be via agreed-upon string literals.

The last thing we want is exchanging binary strings.

Quote from dandl on November 14, 2021, 3:54 am

Rather than (probably pointlessly) debate individual points, I'll point out that I am at least somewhat sympathetic to your approach. Though I would expect which types are available by default to be a product implementation concern only, as there isn't any theoretical basis for it. As a theoretical framework for applied implementations, TTM takes exactly the right approach in not prescribing specific types -- beyond the unavoidable boolean -- but it does appropriately prescribe how (Date & Darwen believe) types should be built.

I would expect that (or similar) ability to define any and all types in any post-SQL DBMS, completely independent of whatever types a given product implementation may or may not provide on boot-up.

In short: Provide whatever built-in types you feel are appropriate in your implementation, but they don't belong in a conceptual framework like TTM, and a reasonable product should neither compel their use or limit the user to only that set as primitives or baseline types.

Just humour me for a moment: forget about programming languages, forget about application building, forget about the motivation of TTM and whether it is still relevant. Just think about the data.

Assume you wanted to foster a large community of data producers and consumers, all using different toolsets, and you were trying to devise a type system to facilitate data publication, interchange and reuse. It has to be easy to work with in any language, and it has to have really wide coverage, to reduce the temptation to add private types.

The lingua franca at present seems to be a choice between CSV files, Excel spreadsheets and JSON.

CSV is horrible and supports text string only; data snooping can give you boolean, text, integer, decimal, real, date/time, enum. Mistakes are really common.

Excel supports text string, real; via formats it supports integer, decimal and date/time; data snooping can give you boolean, enum.

JSON supports boolean, text string, real, and struct; data snooping can give you integer, decimal, datetime, enum.

Why would you not prefer a standard for data models and interchange based on my nine types? Currently there is no standard way to model or exchange binary strings, and many of the commonly used methods depend on data snooping.

CSV files and Excel spreadsheets are what end users use for some ad-hoc, non-programmatic interchange. For that purpose, they're fine.

For programmatic implementation of data interchange, there are numerous formats, including but not limited to various JSON formats, various XML formats, various protocols, various RPC implementations, message queues, and high-level data exchange protocols like HTTP, SMTP, FTP, AMQP and so on.

I would not prefer a standard for data models and interchange based on your nine types. They are unreasonably domain-specific. At least canonical integer, float, boolean and string map to low-level hardware types for maximum performance on the hardware-facing side. Everything else -- and particularly on the data-exchange side -- should be via agreed-upon string literals.

The last thing we want is exchanging binary strings.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#62 · November 14, 2021, 12:50 pm

I see things differently. I've used all those things, and that's why I want to do things differently.

And re binary strings: we do it all the time. See attachment.

Uploaded files:

Andl - A New Database Language - andl.org

#63 · November 14, 2021, 4:15 pm

Quote from tobega on November 14, 2021, 4:15 pm

Quote from Paul Vernon on November 13, 2021, 7:23 pm

I think I'm still curious as to what the contributors to this thread (and any lurkers) actually think the answer is to my original question.

So for some undefined meaning of the word do; Do all scalar values always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Valid answers are: Yes, No, Don't know, Don't care, It depends, If you want them to, It does not matter, Yes and No, Impossible to say, If you declare it so, ... or any other answer as long as it is of no more than say 5(ish) words.

Personally I think: No.

or at least, I don't think it is useful or needed to conceive that they do (OK, so that is more than 5(ish) words :-( )

I'll take a stab at this.

If you see something like 2 or 5 or V, it's all really just chicken scratchings without any value at all. It's not until you identify the type that the value becomes apparent.

So if you wrangle those words, all values must have an inherent type, otherwise they cannot be values.

Empty or NULL are both representations of nothingness, non-existence, which is a type, but does it have a value or is it the absence of value? In my Tailspin programming language, it doesn't have a representation, it is made apparent by the absence of something. A non-value cannot be propagated, it cannot cause an execution of code. To capture it, you can let a surrogate value, e.g. an empty string or a list with no elements, illustrate and carry the concept forward in a particular instance to cause code execution. (the boolean type is really not necessary as such, except as illustrated by nothingness vs somethingness)

When it comes to types in computer languages, Tony Hoare, in his 1973 keynote "Hints on Programming Language Design", advised that a computer language should not allow the programmer to accidentally apply a fixed-point operation to a bit-pattern intended to represent a floating-point number, because that would just give you the wrong answer, without any indication of it being the wrong answer. And that is really the worst possible outcome. Now there are of course other ways to ensure correctness than a type system, but I think it seems obvious that without knowing the type of the bit-pattern chicken scratchings, you cannot know the correct value.

About the word "type", I think it seems fine to say that a duck is a type of animal. And when it comes to type systems there is a school of thought that says "If it waddles like a duck, swims like a duck and quacks like a duck, it must be a duck". Even in dynamic type systems (sorry for cursing in this church), where it might not be obvious what type anything is by the code, things will tend to blow up if you try to make it quack the wrong way, e.g. by trying to add a number to a string. In TCL, strings are the only representation, but it better be parseable as a number if you try to add it to something. On the other hand it will also work fine as a string if you apply the concatenation operator. Javascript is a little worse, because it will helpfully try to convert things in some "logical" way (and possibly just give you the wrong answer without you knowing). Not sure where I was going with that, except to illustrate that it is helpful for the tools we work with to carry the type along so the value is not lost in our heads.

As for what types there should be, I think that current computer languages are way too entrenched in computer representations instead of mental concepts. Representations are not types. When it comes to the type of an attribute, why couldn't it be "a number between 2 and 5 or an english word of at most six letters or a list of those things"? Then again, that is still just expressing constraints on the representation, not capturing the essence of the type, but we might have to make some concessions to what is practically possible.

I have previously in this forum brought up the connection between types and names, and I think that within one application it would be a "smell" if the same name were used for things of different types. So perhaps a decent programming language should disallow that, to help avoid mistakes? (In Tailspin I will let a name "stick" to a string or an untyped number, so they will become identifiers of particular values of the type indicated by the name and no longer be assignable to fields with different names nor usable in arithmetic. To have numbers usable in arithmetic, you need to assign a unit of measure, e.g. "apples" or "oranges" or if it's just for counting, "1")

Really diverged a bit there, but hopefully there is something worthwhile in that dump.

Quote from Paul Vernon on November 13, 2021, 7:23 pm

I think I'm still curious as to what the contributors to this thread (and any lurkers) actually think the answer is to my original question.

So for some undefined meaning of the word do; Do all scalar values always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Valid answers are: Yes, No, Don't know, Don't care, It depends, If you want them to, It does not matter, Yes and No, Impossible to say, If you declare it so, ... or any other answer as long as it is of no more than say 5(ish) words.

Personally I think: No.

or at least, I don't think it is useful or needed to conceive that they do (OK, so that is more than 5(ish) words :-( )

I'll take a stab at this.

If you see something like 2 or 5 or V, it's all really just chicken scratchings without any value at all. It's not until you identify the type that the value becomes apparent.

So if you wrangle those words, all values must have an inherent type, otherwise they cannot be values.

Empty or NULL are both representations of nothingness, non-existence, which is a type, but does it have a value or is it the absence of value? In my Tailspin programming language, it doesn't have a representation, it is made apparent by the absence of something. A non-value cannot be propagated, it cannot cause an execution of code. To capture it, you can let a surrogate value, e.g. an empty string or a list with no elements, illustrate and carry the concept forward in a particular instance to cause code execution. (the boolean type is really not necessary as such, except as illustrated by nothingness vs somethingness)

When it comes to types in computer languages, Tony Hoare, in his 1973 keynote "Hints on Programming Language Design", advised that a computer language should not allow the programmer to accidentally apply a fixed-point operation to a bit-pattern intended to represent a floating-point number, because that would just give you the wrong answer, without any indication of it being the wrong answer. And that is really the worst possible outcome. Now there are of course other ways to ensure correctness than a type system, but I think it seems obvious that without knowing the type of the bit-pattern chicken scratchings, you cannot know the correct value.

About the word "type", I think it seems fine to say that a duck is a type of animal. And when it comes to type systems there is a school of thought that says "If it waddles like a duck, swims like a duck and quacks like a duck, it must be a duck". Even in dynamic type systems (sorry for cursing in this church), where it might not be obvious what type anything is by the code, things will tend to blow up if you try to make it quack the wrong way, e.g. by trying to add a number to a string. In TCL, strings are the only representation, but it better be parseable as a number if you try to add it to something. On the other hand it will also work fine as a string if you apply the concatenation operator. Javascript is a little worse, because it will helpfully try to convert things in some "logical" way (and possibly just give you the wrong answer without you knowing). Not sure where I was going with that, except to illustrate that it is helpful for the tools we work with to carry the type along so the value is not lost in our heads.

As for what types there should be, I think that current computer languages are way too entrenched in computer representations instead of mental concepts. Representations are not types. When it comes to the type of an attribute, why couldn't it be "a number between 2 and 5 or an english word of at most six letters or a list of those things"? Then again, that is still just expressing constraints on the representation, not capturing the essence of the type, but we might have to make some concessions to what is practically possible.

I have previously in this forum brought up the connection between types and names, and I think that within one application it would be a "smell" if the same name were used for things of different types. So perhaps a decent programming language should disallow that, to help avoid mistakes? (In Tailspin I will let a name "stick" to a string or an untyped number, so they will become identifiers of particular values of the type indicated by the name and no longer be assignable to fields with different names nor usable in arithmetic. To have numbers usable in arithmetic, you need to assign a unit of measure, e.g. "apples" or "oranges" or if it's just for counting, "1")

Really diverged a bit there, but hopefully there is something worthwhile in that dump.

#64 · November 14, 2021, 5:23 pm

Quote from dandl on November 14, 2021, 12:50 pm

I see things differently. I've used all those things, and that's why I want to do things differently.

And re binary strings: we do it all the time. See attachment.

Yes, but if that was attached to an email, it would be MIME encoded. The attachment payload might be 8BITMIME encoded, but more likely Base64. Text.

And even if the attachment itself is binary, the SMTP protocol that encapsulates it is text.

The HTTP 'PUT' that got it from your workstation to this site... Text.

There's a good reason we stopped doing things differently and now use pure text or binary-encapsulated-in-text formats for data exchange: Binary formats are a pain to develop and debug parsers, particularly across machine architectures. Text formats, though the payload is sometimes larger and slower than a binary equivalent (though the transmission protocol can do compression/decompression of the text format), are much easier to process, easier to debug, and generally avoid machine-dependent issues.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#65 · November 14, 2021, 6:49 pm

BTW, whether you use a binary protocol/representation or a text-based one, please include an unambiguous protocol/representation version number in every message header (or equivalent.) Failure to include version numbers when the protocol/representation changes -- and it will -- are a source of much error-prone technical divination, and much cursing.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#66 · November 14, 2021, 7:08 pm

Quote from Paul Vernon on November 14, 2021, 7:08 pm
Thank you all for your answers. If you will allow, I'll stay on topic

I wonder if the term scalar value is causing some trouble. David said "value has a variety of different meanings". Let me try a different word: atom. I take the following from NFU

Not all objects in our universe are sets. Objects which are not sets are called "atoms". You can think of ordinary physical objects, for instance, as being atoms. We certainly do not think of them as being sets! Atoms have no elements, since they are not sets:

Axiom of Atoms. If x is an atom, then for all y, y ∉ x (read "y is not an element of x").

So, then my restated question, using the above definition of atom, is

Do all atoms always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Now I guess the allowed answers are. Yes, No. or Yes|No but an atom is not (even vaguely close to my concept of) a scalar value.

I note in passing that NFU itself does have a concept that it calls type. This is used in "stratification" which (as far as I understand these things) is how NFU avoids paradoxes like Russell's. This is again from the above PDF (my bold), and I include it just to note that at least this set theory does not depend on atom's being typed in the TTM sense

We will use the word "type" to refer to the different "roles", sometimes qualifying it as "relative type" to remind ourselves that we do not assume the existence of different kinds of object, as in Russell's original theory, but of different levels of permitted access to one kind of object

BTW, I was wondering if another example would help (or not). Take an electron. Say the one spinning (?) round this hydrogen atom I have here. What is the type of that electron? Is it "electron"? I.e. it is a member of the set of all electrons. Does that mean it has a type or it does not need one as "it is is own type". By analogy can we say a number 1 is of type 1. It is a member of the set of all ones. I mean, I'm pretty sure I don't need to know than an electron is a sub-atomic particle, that it is a lepton, that is is a charged lepton, to know what it is. It is an electron. Same that I don't need to know that 1 is an Integer, a Rational, an Odd, a Square etc, I know what it is, it is 1. It would still be 1 even if there were no other numbers. It is only a number because 2 exists and we can do arithmetic once we have 1 and 2, but 1 itself exists before we get to that point. 1 exists *before* we put it into a set with 2 and say, look there is an interesting set that we can call numbers

You might be asking, what is the point of all this ? Well, for me, I want to know how sound a foundation TTM is for, well "future database systems". Now I think TTM is a fantastic piece of work. However, after not insignificant considerations, I have come to strongly believe that at least two of its starting assumptions are incorrect.

To be clear, I don't actually mind the first part of RM Pre. 2. The requirement that all values have to have (at least one) type. It is not a bad rule actually (esp if you replace "have to" with "should"). What I object to is that the idea that values *intrinsically* have a type. I.e. the (unfortunately in-explicit) axiom of TTM that a value cannot exist without it being the member of at least one set of values that is nominated as a type.

And the problem is, if you think one of someone's axioms is wrong, well you have to consider re-evaluating the whole edifice that is built on them.

P.S. Is the TTM definition of scalar value and type not dangerously circular? A type is a named, finite set of scalar values. A scalar value carries with it its type. Which comes first, the type chicken or the value egg?

P.P.S. David, you said that outside of programming the answer is "it depends". Have you got an example of a value that does not carry with it its type? (free free to use whatever definition "value" you like (within reason))

Thank you all for your answers. If you will allow, I'll stay on topic

I wonder if the term scalar value is causing some trouble. David said "value has a variety of different meanings". Let me try a different word: atom. I take the following from NFU

Not all objects in our universe are sets. Objects which are not sets are called "atoms". You can think of ordinary physical objects, for instance, as being atoms. We certainly do not think of them as being sets! Atoms have no elements, since they are not sets:

Axiom of Atoms. If x is an atom, then for all y, y ∉ x (read "y is not an element of x").

So, then my restated question, using the above definition of atom, is

Do all atoms always carry with them, at least conceptually, some identification of the type(s) to which they belong?

Now I guess the allowed answers are. Yes, No. or Yes|No but an atom is not (even vaguely close to my concept of) a scalar value.

I note in passing that NFU itself does have a concept that it calls type. This is used in "stratification" which (as far as I understand these things) is how NFU avoids paradoxes like Russell's. This is again from the above PDF (my bold), and I include it just to note that at least this set theory does not depend on atom's being typed in the TTM sense

We will use the word "type" to refer to the different "roles", sometimes qualifying it as "relative type" to remind ourselves that we do not assume the existence of different kinds of object, as in Russell's original theory, but of different levels of permitted access to one kind of object

BTW, I was wondering if another example would help (or not). Take an electron. Say the one spinning (?) round this hydrogen atom I have here. What is the type of that electron? Is it "electron"? I.e. it is a member of the set of all electrons. Does that mean it has a type or it does not need one as "it is is own type". By analogy can we say a number 1 is of type 1. It is a member of the set of all ones. I mean, I'm pretty sure I don't need to know than an electron is a sub-atomic particle, that it is a lepton, that is is a charged lepton, to know what it is. It is an electron. Same that I don't need to know that 1 is an Integer, a Rational, an Odd, a Square etc, I know what it is, it is 1. It would still be 1 even if there were no other numbers. It is only a number because 2 exists and we can do arithmetic once we have 1 and 2, but 1 itself exists before we get to that point. 1 exists *before* we put it into a set with 2 and say, look there is an interesting set that we can call numbers

You might be asking, what is the point of all this ? Well, for me, I want to know how sound a foundation TTM is for, well "future database systems". Now I think TTM is a fantastic piece of work. However, after not insignificant considerations, I have come to strongly believe that at least two of its starting assumptions are incorrect.

To be clear, I don't actually mind the first part of RM Pre. 2. The requirement that all values have to have (at least one) type. It is not a bad rule actually (esp if you replace "have to" with "should"). What I object to is that the idea that values *intrinsically* have a type. I.e. the (unfortunately in-explicit) axiom of TTM that a value cannot exist without it being the member of at least one set of values that is nominated as a type.

And the problem is, if you think one of someone's axioms is wrong, well you have to consider re-evaluating the whole edifice that is built on them.

P.S. Is the TTM definition of scalar value and type not dangerously circular? A type is a named, finite set of scalar values. A scalar value carries with it its type. Which comes first, the type chicken or the value egg?

P.P.S. David, you said that outside of programming the answer is "it depends". Have you got an example of a value that does not carry with it its type? (free free to use whatever definition "value" you like (within reason))

#67 · November 14, 2021, 7:35 pm

Quote from Paul Vernon on November 14, 2021, 7:08 pm

BTW, I was wondering if another example would help (or not). Take an electron. Say the one spinning (?) round this hydrogen atom I have here. What is the type of that electron? Is it "electron"? I.e. it is a member of the set of all electrons. Does that mean it has a type or it does not need one as "it is is own type". By analogy can we say a number 1 is of type 1. It is a member of the set of all ones. I mean, I'm pretty sure I don't need to know than an electron is a sub-atomic particle, that it is a lepton, that is is a charged lepton, to know what it is. It is an electron. Same that I don't need to know that 1 is an Integer, a Rational, an Odd, a Square etc, I know what it is, it is 1. It would still be 1 even if there were no other numbers. It is only a number because 2 exists and we can do arithmetic once we have 1 and 2, but 1 itself exists before we get to that point. 1 exists *before* we put it into a set with 2 and say, look there is an interesting set that we can call numbers

Knowing the type of a value 10 appearing in a programming language is what determines the result when you (say) divide it by another value 3. Or add it to a value "3". Or whether it's not addable or dividable at all, because it's part of a street address. Or do you treat it as a decimal value or a binary value -- maybe it's actually (in decimal) 2. Etc.

P.S. Is the TTM definition of scalar value and type not dangerously circular? A type is a named, finite set of scalar values. A scalar value carries with it its type. Which comes first, the type chicken or the value egg?

They are co-defined. A type is a set of values; a value is a member of a type. Consider them together.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#68 · November 14, 2021, 8:03 pm

To a mathematician and to the general public the following makes no sense

Knowing the type of a value 10 appearing in a programming language is what determines the result when you (say) divide it by another value 3

10 divided by 3 is 3¹⁄₃

there can be no other answer. integer-division is not division.

(and no ¹⁰⁄₃ is not a number, or if it is then 3¹⁄₃ is not a number... please lets just have one canonical form for every value - there really is no good reason not too)

#69 · November 14, 2021, 8:11 pm

Quote from Paul Vernon on November 14, 2021, 8:03 pm

To a mathematician and to the general public the following makes no sense

Knowing the type of a value 10 appearing in a programming language is what determines the result when you (say) divide it by another value 3

10 divided by 3 is 3¹⁄₃

there can be no other answer. integer-division is not division.

Then you have made a decision about how one aspect of your type system will work: no integer division.

That's perfectly fine, of course. Though some programmers might argue that you should provide integer division. Then it will be up to you to defend your decision, should you choose to do so. :-)

Will all your numeric values be floating point, like IEEE-854 (or similar) floats, or something else like a precise rational or decimal type?

Will you have only one numeric type, or various numeric types?

Will you have a distinct boolean type, or will you overload some numeric values for that purpose -- e.g., 0 for false and non-0 for true?

Etc.

I ask not because I'm seeking answers -- the answers are up to you -- but they are things you need to consider in any computer language that handles numeric (which, of course, is a type) values, and boolean (also a type) values, and I presume you'll have strings? Another type, that.

Of course, there are ways to design a language so the only value type is 'string', but those operators that do arithmetic with strings of digits (and the occasional decimal point?) will still have to take into account the answers to my questions.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#70 · November 14, 2021, 8:47 pm

I'm not saying that I would not have types. I mean, named set of values such as the set of integers, the set of evens, the set of primes are all very useful sets, certainly worth naming them and using them for constraints, or to guaranteeing that applying a function to a value that is known to be a member of such a set will return a result rather than empty. Such sets I might well call "types" for want of a better word (such as say domain)

What I'm saying is that values are not intrinsically typed - they they don't "carry their type(s) with them". I'm saying that types are not axiomatic. They are not foundational.

There is an important difference (a logical difference, as D&D would say) between the above two positions.

The Forum for Discussion about The Third Manifesto and Related Matters