The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

SUMMARIZE PER, OUTER JOIN and image relations

Quote from dandl on February 21, 2020, 12:08 am

...

FLOAT as per IEEE is good, except where it conflicts with TTM. IMO the default should be the TTM way (so an exception instead of NaN and friends), but as a configurable option for the desperate.

Here, worlds collide.

I've got to think about this. Configurable options in languages are almost invariably abominable.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Rene Hartmann on February 20, 2020, 10:35 pm

I recently added NaN to DuroDBMS. NaN = NaN yields true, giving RM Pre 8 priority over IEEE 754.

Right now I discovered that CAST_AS_INTEGER(NaN) = -2147483648. Maybe raising an error would be better.

FLOAT and RATIONAL are synonyms. Maybe one day I will separate them, using an arbitrary-precision rational type for RATIONAL. Since DuroDBMS is based on C, the GNU Multiple Precision Arithmetic Library looks like a reasonable choice.

(NaN = NaN) = TRUE is very good.  It means one can use a WHERE clause to discover or eliminate tuples with NaN, with good knock-on effects for JOIN, MATCHING, etc.   With Rel  one can't do that.

I wonder what the IEEE committee thought went wrong with (NaN = NaN) = TRUE.

What about, e.g., 1.0 < NaN?  Is NaN doing double duty for both positive infinity and negative infinity?

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on February 21, 2020, 1:00 pm
Quote from Rene Hartmann on February 20, 2020, 10:35 pm

I recently added NaN to DuroDBMS. NaN = NaN yields true, giving RM Pre 8 priority over IEEE 754.

Right now I discovered that CAST_AS_INTEGER(NaN) = -2147483648. Maybe raising an error would be better.

FLOAT and RATIONAL are synonyms. Maybe one day I will separate them, using an arbitrary-precision rational type for RATIONAL. Since DuroDBMS is based on C, the GNU Multiple Precision Arithmetic Library looks like a reasonable choice.

(NaN = NaN) = TRUE is very good.  It means one can use a WHERE clause to discover or eliminate tuples with NaN, with good knock-on effects for JOIN, MATCHING, etc.   With Rel  one can't do that.

That's correct. But NaN's behaviour in Rel is faithful to the specification of IEEE 754 floating point.

Making NaN = Nan = TRUE is correct according to TTM et al. but wrong according to IEEE 754.

Making NaN = NaN = FALSE is correct according to IEEE 754 but wrong according to TTM et al.

Adhering to TTM breaks implementation of numerical algorithms that assume IEEE 754 float behaviour, but delivers correct TTM behaviour.

Adhering to IEEE 754 allows correct implementation of numerical algorithms but breaks TTM expectations.

From a pedagogical point of view, that's excellent -- there's much teaching opportunity here.

But from a practical point of view, it's a bit of a no-win situation. I suppose I could offer FLOAT and FLOAT2, one with TTM-compliant NaN and the other with IEEE 754 compliant NaN. (There's shades of Oracle's VARCHAR and VARCHAR2 here, though that's arguably even worse.)

I wonder what the IEEE committee thought went wrong with (NaN = NaN) = TRUE.

I don't know. I have seen one suggestion online -- from someone who claimed to have been on the IEEE 754 standards committee -- that it perhaps should have been NaN = NaN = TRUE but it's too late to change it now.

From what I know of numerical representation folks, there will almost certainly be some among them who will argue vociferously that treating undefined as equal to undefined is ludicrous. How can undefined be "equal" to anything?!?? Etc.

I neither defend nor condemn such views; I only point out that they exist.

What about, e.g., 1.0 < NaN?  Is NaN doing double duty for both positive infinity and negative infinity?

No, there are -Inf and +Inf for that.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Actually there is more than one NaN. Quoting Wikipedia:

IEEE 754 NaNs are encoded with the exponent field filled with ones (like infinity values), and some non-zero number in the significand field (to make them distinct from infinity values); this allows the definition of multiple distinct NaN values, depending on which bits are set in the significand field, but also on the value of the leading sign bit (but applications are not required to provide distinct semantics for those distinct NaN values).

For example, a bit-wise IEEE floating-point standard single precision (32-bit) NaN would be

s111 1111 1xxx xxxx xxxx xxxx xxxx xxxx

where s is the sign (most often ignored in applications) and the x sequence represents a non-zero number (the value zero encodes infinities). The first bit from x is used to determine the type of NaN: "quiet NaN" or "signaling NaN". The remaining bits encode a payload (most often ignored in applications).

Interestingly, Oracle follows IEEE 754 while PosgreSQL doesn't. From the PostgreSQL documentation:

Note: In most implementations of the "not-a-number" concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

Interestingly, Oracle follows IEEE 754 while PosgreSQL doesn't. From the PostgreSQL documentation:

Note: In most implementations of the "not-a-number" concept, NaN is not considered equal to any other numeric value (including NaN). In order to allow numeric values to be sorted and used in tree-based indexes, PostgreSQL treats NaN values as equal, and greater than all non-NaN values.

Postgres is quite opinionated in a number of areas. In this, on balance, I would agree. Correct and consistent behaviour on sorting and naive usage outweighs blind adherence and support for cut-and-paste imported code.

Andl - A New Database Language - andl.org
Quote from Dave Voorhis on February 21, 2020, 12:21 pm
Quote from dandl on February 21, 2020, 12:08 am

...

FLOAT as per IEEE is good, except where it conflicts with TTM. IMO the default should be the TTM way (so an exception instead of NaN and friends), but as a configurable option for the desperate.

Here, worlds collide.

I've got to think about this. Configurable options in languages are almost invariably abominable.

I think that's overly strong. Our commercial language product has hundreds of configurable options, some of them affecting core language semantics. It's the only way you can make sensible choices for new development while maintaining unflinching backward compatibility. We have 25 year old code still compiling and running just fine. Aggressive versioning and deprecation seems to be the thing in the OSS communities, but some people just want their old code to keep working.

As per Postgres, I would strongly prefer my numbers to sort and collate correctly, and that means correct implementation of ordered comparisons. Others might have old code or old algorithms that depend on IEEE compliance. You can't have it both ways, and I don't want you to choose (unless you choose my way, of course!).

Andl - A New Database Language - andl.org
Quote from dandl on February 21, 2020, 11:34 pm
Quote from Dave Voorhis on February 21, 2020, 12:21 pm
Quote from dandl on February 21, 2020, 12:08 am

...

FLOAT as per IEEE is good, except where it conflicts with TTM. IMO the default should be the TTM way (so an exception instead of NaN and friends), but as a configurable option for the desperate.

Here, worlds collide.

I've got to think about this. Configurable options in languages are almost invariably abominable.

I think that's overly strong. Our commercial language product has hundreds of configurable options, some of them affecting core language semantics. It's the only way you can make sensible choices for new development while maintaining unflinching backward compatibility.

The xBase languages were notorious for having numerous language-wide configuration settings, changeable -- if you were so inclined -- via code. Visual Basic for Applications was similarly noted for its 'Option' settings that affected operator semantics within a module scope.

That meant that the semantics for a number of operators could not simply be read, they had to be determined by carefully -- and sometimes laboriously and/or slowly -- identifying the semantically-significant settings at each and every point where a given setting-sensitive operator was used. It almost guarantees unpredictable and non-repeatable behaviour, as semantics are dependent on choice of settings, a given choice of settings cannot be guaranteed in 3rd party code, and/or require developer discipline to maintain (never a good idea, for obvious reasons), and/or have to be explicitly preserved/set/restored at every affected boundary.

To be sensible, the semantics of every operator must be static, permanent, ideally perpetual, and defined in terms of its parameters and nothing else.

As such, "abominable" isn't strong enough. It barely touches the degree to which modal language features are appalling.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2020, 11:43 am
Quote from dandl on February 21, 2020, 11:34 pm
Quote from Dave Voorhis on February 21, 2020, 12:21 pm
Quote from dandl on February 21, 2020, 12:08 am

...

FLOAT as per IEEE is good, except where it conflicts with TTM. IMO the default should be the TTM way (so an exception instead of NaN and friends), but as a configurable option for the desperate.

Here, worlds collide.

I've got to think about this. Configurable options in languages are almost invariably abominable.

I think that's overly strong. Our commercial language product has hundreds of configurable options, some of them affecting core language semantics. It's the only way you can make sensible choices for new development while maintaining unflinching backward compatibility.

The xBase languages were notorious for having numerous language-wide configuration settings, changeable -- if you were so inclined -- via code. Visual Basic for Applications was similarly noted for its 'Option' settings that affected operator semantics within a module scope.

That meant that the semantics for a number of operators could not simply be read, they had to be determined by carefully -- and sometimes laboriously and/or slowly -- identifying the semantically-significant settings at each and every point where a given setting-sensitive operator was used. It almost guarantees unpredictable and non-repeatable behaviour, as semantics are dependent on choice of settings, a given choice of settings cannot be guaranteed in 3rd party code, and/or require developer discipline to maintain (never a good idea, for obvious reasons), and/or have to be explicitly preserved/set/restored at every affected boundary.

To be sensible, the semantics of every operator must be static, permanent, ideally perpetual, and defined in terms of its parameters and nothing else.

As such, "abominable" isn't strong enough. It barely touches the degree to which modal language features are appalling.

I would never aspire to anything as bad as that. Yes, it should be stamped out.

But your argument invokes a number of fallacies to avoid addressing the core issue. I say that there are specific occasions on which a language/library designer should avoid making choices that bind the user, and should offer those choices for the user to make. I have absolutely no interest in horror stories of languages in the distant past or tales of bad choices made by language designers that render code unreadable. I am interested in how to solve real problems using effective strategies.

Andl - A New Database Language - andl.org
Quote from dandl on February 23, 2020, 3:54 am
Quote from Dave Voorhis on February 22, 2020, 11:43 am
Quote from dandl on February 21, 2020, 11:34 pm
Quote from Dave Voorhis on February 21, 2020, 12:21 pm
Quote from dandl on February 21, 2020, 12:08 am

...

FLOAT as per IEEE is good, except where it conflicts with TTM. IMO the default should be the TTM way (so an exception instead of NaN and friends), but as a configurable option for the desperate.

Here, worlds collide.

I've got to think about this. Configurable options in languages are almost invariably abominable.

I think that's overly strong. Our commercial language product has hundreds of configurable options, some of them affecting core language semantics. It's the only way you can make sensible choices for new development while maintaining unflinching backward compatibility.

The xBase languages were notorious for having numerous language-wide configuration settings, changeable -- if you were so inclined -- via code. Visual Basic for Applications was similarly noted for its 'Option' settings that affected operator semantics within a module scope.

That meant that the semantics for a number of operators could not simply be read, they had to be determined by carefully -- and sometimes laboriously and/or slowly -- identifying the semantically-significant settings at each and every point where a given setting-sensitive operator was used. It almost guarantees unpredictable and non-repeatable behaviour, as semantics are dependent on choice of settings, a given choice of settings cannot be guaranteed in 3rd party code, and/or require developer discipline to maintain (never a good idea, for obvious reasons), and/or have to be explicitly preserved/set/restored at every affected boundary.

To be sensible, the semantics of every operator must be static, permanent, ideally perpetual, and defined in terms of its parameters and nothing else.

As such, "abominable" isn't strong enough. It barely touches the degree to which modal language features are appalling.

I would never aspire to anything as bad as that. Yes, it should be stamped out.

But your argument invokes a number of fallacies to avoid addressing the core issue. I say that there are specific occasions on which a language/library designer should avoid making choices that bind the user, and should offer those choices for the user to make. I have absolutely no interest in horror stories of languages in the distant past or tales of bad choices made by language designers that render code unreadable. I am interested in how to solve real problems using effective strategies.

There's only one effective strategy that avoids the problems I described: the semantics of every operator must be static, permanent, ideally perpetual, and self-contained, i.e., defined in terms of the operator itself plus its parameters and nothing else.

That implies one and only one effective enhancement strategy: add fixes and functionality by adding operators (and associated types, if necessary) without changing or removing the existing ones.

I'm happy to violate this in the form of changing or removing existing operators and types in a teaching tool. So I might, if appropriate, do so in Rel. The resulting conversion pain for those of us who use it in production -- and/or for teaching -- will be relatively brief but the end result will be improvement for us and others.

I appreciate that conversion processes are annoying -- and for that I apologise in advance -- but I don't think any of us who are using Rel for personal production purposes are going to have to insert control rods in the reactors and shut down the power station. At worst, an afternoon or two of gentle conversion effort, if even that.

But for commercial or commercial-grade production tools, absolutely not. The semantics of every operator must be static, self-contained (per above), permanent, and perpetual, unless it can be proven that fixing broken functionality cannot possibly break production systems. Otherwise, fixes and functionality may only be provisioned by adding operators and associated types without changing or removing existing ones.

Modal behaviour, whether programmatically alterable or only via a system configuration -- which implies operator semantics that are externally determined and defined by more than parameters and the operator itself -- is categorically forbidden. It's too wrong to be allowed in any form.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

I'm happy to violate this in the form of changing or removing existing operators and types in a teaching tool. So I might, if appropriate, do so in Rel. The resulting conversion pain for those of us who use it in production -- and/or for teaching -- will be relatively brief but the end result will be improvement for us and others.

I appreciate that conversion processes are annoying -- and for that I apologise in advance -- but I don't think any of us who are using Rel for personal production purposes are going to have to insert control rods in the reactors and shut down the power station. At worst, an afternoon or two of gentle conversion effort, if even that.

Rel is arguably a toy language, in perpetual beta, and a bit of instability it to be expected. There are many pieces of code out there that lie somewhere further along the stability/maturity spectrum, with many dependencies and significant consequences if it changes. I have a fairly substantial piece of Ruby on Rails which I can no longer get to work, because features I depended on have been deprecated and then removed. Maybe I shouldn't have done it that way, but the RoR team killed my project.

But for commercial or commercial-grade production tools, absolutely not. The semantics of every operator must be static, self-contained (per above), permanent, and perpetual, unless it can be proven that fixing broken functionality cannot possibly break production systems. Otherwise, fixes and functionality may only be provisioned by adding operators and associated types without changing or removing existing ones.

Modal behaviour, whether programmatically alterable or only via a system configuration -- which implies operator semantics that are externally determined and defined by more than parameters and the operator itself -- is categorically forbidden. It's too wrong to be allowed in any form.

Then you rule out one strategy for dealing with serious bugs and mistakes. Assume you have a language product that depends on a library that has traditionally been part of the core, something like maths, date/time, regex, crypto, something important and hard. The library has had known problems: bugs, precision, arbitrary limits, hackable, whatever.  The old library is no longer maintained/maintainable, and it is proposed to replace the library by a new one that is API compatible, but inevitably has small differences in semantics. All new users want the new improved library, all existing production users want their code to keep working as is without fail. Getting locked into an old revision of the product is not an option, because of updates in other parts of the product.

My strategy is to make the library a user choice, and the same code may work slightly differently depending on which is chosen. Your strategy, please?

Andl - A New Database Language - andl.org