Tutorial D – Relational Model Number Type

#1 · March 23, 2020, 4:56 pm

I've been reading some of Chris Dates work recently and he mentioned how the INTEGER and RATIONAL data types were corruptions of the pure rational model because they are leakage from the computers physical layer.

That got me thinking, could it be possible introduce a NUMBER data type into the Tutorial D language, that would match an end users mental model. I then remembered seeing Douglas Crockford (JS - The Good Parts / JSON, guy) talk about DEC64. He was talking about his recommendations for features in new programming languages, the Dec64 website outlines it's advantages. Perhaps DEC64 could be used to provide a NUMBER type in Tutorial D? (imaginary numbers would still have to be a separate type though.

I'm guessing there would be some penalty that would have to be paid like, more main memory required, fewer numbers able to be stored in registers etc, but from a model perspective it could be a real advantage.

I'd like to know your thoughts.

#2 · March 23, 2020, 8:58 pm

Starting on roughly page 2 of this thread https://forum.thethirdmanifesto.com/forum/topic/summarize-per-outer-join-and-image-relations is a recent discussion (bit of a thread hijack, actually) about NaN in Tutorial D and Rel and related topics. It's not directly related to your idea, but might have some relevance.

Off hand, and without much thought about it, I don't see anything that in principle would be a problem with implementing NUMBER in an implementation of Tutorial D.

Replacing INTEGER and RATIONAL in the official specification of Tutorial D would, however, render examples in the various Date & Darwen publications obsolete. Tutorial D implementations would likely add NUMBER but keep INTEGER and RATIONAL, in order to maximise compatibility. That's certainly what I'd do with Rel.

But is it worth it for what is primarily a pedagogical and illustrative language?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#3 · March 24, 2020, 12:29 am

TD made a number of choices in its pursuit of its perceived target audience, perhaps not all good ones. RATIONAL is not one of the good ones, but that's an interesting topic in its own right.

I regard the number definition from JSON as the best available. NUMBER is simply a string of decimal digits containing a decimal point. Leading and trailing zeros are ignored. Optionally, a decimal scale factor may be applied, but is never required.

A practical NUMBER type should/may include a single non-numeric NAN value, perhaps represented as a decimal point with no digits or an empty string of digits. NAN = NAN.

INTEGER is a NUMBER with no decimal point or scale factor. It can be treated as a subtype of NUMBER, or a type in its own right.

REAL64 is a 64-bit IEEE floating point value. It can be treated as a subtype (disregarding NaN/INF etc) of NUMBER, or a type in its own right. It should only be used where performance is a dominant issue. It bites.

DEC64 is a much better idea than REAL64, and is a subtype of NUMBER.

Andl - A New Database Language - andl.org

#4 · March 24, 2020, 2:44 am

In my view, the deepest divide in numbers is between exact rationals and inexact rationals. You can represent exact rationals with a numerator/denominator pair, or as decimal fractions, or fractions to any other base. The problem with them is that in order to maintain exactness the numbers get larger and larger.

Therefore, long ago inexact rationals were invented. The operators applicable to them produce results of bounded size, but often inaccurate. The underlying base can be 2, as commonly today, or 16, which IBM mainframes use, or 10, as the IBM 1130 did and which has been standardized by the IEEE but not AFAIK implemented on any widely used hardware platform (the IBM z machines and the POWER 6 chip are exceptions). If you don't insist on too many significant digits, base-10 floats aren't any more accurate than base-2 floats, and they definitely require more hardware support.

In practice, INTEGER is a subtype of the exact rationals, RATIONAL of the inexact rationals. That's the sense in which the underlying machine leaks into the model. But in practice it is not that different from the fact that you can't have an unlimited number of tuples in a relation, or of attributes in a tuple.

#5 · March 24, 2020, 5:11 am

Generally I agree. It's nice to have very fast FP built into the hardware, but most of us would rather not most of the time.

[BTW I forgot to mention REAL32, which is widespread in graphics programming, and is heaps better than the integers that were so common in earlier days. It tends to live only in arrays/vectors/matrices etc, and causes few of the problems of its big brother.]

The reason I like the JSON definition is that REAL64 is a subset of NUMBER, not the other way around. Every 64-bit numeric value can be represented exactly by a string of no more than 16 decimal digits, plus either a bunch of zeros or a scale factor. Every REAL64 is an exact value, but is used as if it was a range or approximation. It's hard to do calculations taking into account that range unless you can convert them to exact values.

Obviously the one big gap in all this nice theory is the irrationals. I can't count the time I've spent fixing rounding problems, even when calculating in decimal. We have 10% GST in Australia, which must be included in the retail price, so division by 11 and deciding whether to round each item, the subtotal or the grand total provide endless joy.

Andl - A New Database Language - andl.org

#6 · March 24, 2020, 5:47 am

If you don't want machine abstractions to leak through, then the numeric types would all be unlimited length or precision.

Whether you have INTEGER or RATIONAL or FLOAT or whatever, the first 2 at least would not be qualified or limited by size.

As for JSON, its ONLY numeric type is a floating-point number, meaning all numbers are inexact, and this is not a model a database language should be following the example of.

Floats are fine, but only as an option, and exact numbers need to be supported as well.

#7 · March 24, 2020, 8:06 am

Quote from Darren Duncan on March 24, 2020, 5:47 am

If you don't want machine abstractions to leak through, then the numeric types would all be unlimited length or precision.

This is a non-sequitur. You can place limits on size as you see fit, while paying no attention to the hardware. Incidentally, I'm careful to note that it is the IEEE standard we are talking about, even if that happens to be implemented in software on an 8 bit microprocessor.

Whether you have INTEGER or RATIONAL or FLOAT or whatever, the first 2 at least would not be qualified or limited by size.

As for JSON, its ONLY numeric type is a floating-point number, meaning all numbers are inexact, and this is not a model a database language should be following the example of.

The JSON spec is here: https://www.json.org/json-en.html. This is an abstraction, and mentions no implementation types. There are recommendations to limit precision to that of FP for portability, but it's not a requirement.

Floats are fine, but only as an option, and exact numbers need to be supported as well.

All JSON numbers are exact.

Andl - A New Database Language - andl.org

#8 · March 24, 2020, 8:14 am

Quote from dandl on March 24, 2020, 8:06 am

Quote from Darren Duncan on March 24, 2020, 5:47 am

If you don't want machine abstractions to leak through, then the numeric types would all be unlimited length or precision.

This is a non-sequitur. You can place limits on size as you see fit, while paying no attention to the hardware. Incidentally, I'm careful to note that it is the IEEE standard we are talking about, even if that happens to be implemented in software on an 8 bit microprocessor.

Whether you have INTEGER or RATIONAL or FLOAT or whatever, the first 2 at least would not be qualified or limited by size.

As for JSON, its ONLY numeric type is a floating-point number, meaning all numbers are inexact, and this is not a model a database language should be following the example of.

The JSON spec is here: https://www.json.org/json-en.html. This is an abstraction, and mentions no implementation types. There are recommendations to limit precision to that of FP for portability, but it's not a requirement.

Floats are fine, but only as an option, and exact numbers need to be supported as well.

All JSON numbers are exact.

That's characteristic of text-based representations, which permit richer semantics (e.g., can represent repeating decimals, special values like pi or e, etc., explicitly) in general, at the expense of performance. IEEE 754 was baked into hardware with a binary representation at a time when the conversion from text to binary for numeric operations would have been an unacceptable performance hit. For some applications, it still is.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

#9 · March 24, 2020, 12:56 pm

Quote from MatTaylor on March 23, 2020, 4:56 pm

I've been reading some of Chris Dates work recently and he mentioned how the INTEGER and RATIONAL data types were corruptions of the pure rational model because they are leakage from the computers physical layer.

That got me thinking, could it be possible introduce a NUMBER data type into the Tutorial D language, that would match an end users mental model. I then remembered seeing Douglas Crockford (JS - The Good Parts / JSON, guy) talk about DEC64. He was talking about his recommendations for features in new programming languages, the Dec64 website outlines it's advantages. Perhaps DEC64 could be used to provide a NUMBER type in Tutorial D? (imaginary numbers would still have to be a separate type though.

I'm guessing there would be some penalty that would have to be paid like, more main memory required, fewer numbers able to be stored in registers etc, but from a model perspective it could be a real advantage.

I'd like to know your thoughts.

Thanks for the suggestion but we aren't really planning on further extensions to Tutorial D. In any case the addition isn't needed for the language's intended purposes. An implementation such as Rel could of course consider such additions.

Hugh

Coauthor of The Third Manifesto and related books.

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters

Tutorial D - Relational Model Number Type