The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

SUMMARIZE PER, OUTER JOIN and image relations

Quote from Hugh on February 17, 2020, 12:43 pm

...  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.

In Rel, I'm thinking of changing AVG on an empty list to return the canonical floating point value NaN, or Not a Number, rather than throw exceptions as it does now (except when invoking AVG as, for example, AVG {}).

I think it's more ergonomic that way and equally correct, but I'm happy to be convinced otherwise if anyone thinks it's not a change for the better.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 17, 2020, 1:54 pm
Quote from Hugh on February 17, 2020, 12:43 pm

...  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.

In Rel, I'm thinking of changing AVG on an empty list to return the canonical floating point value NaN, or Not a Number, rather than throw exceptions as it does now (except when invoking AVG as, for example, AVG {}).

I think it's more ergonomic that way and equally correct, but I'm happy to be convinced otherwise if anyone thinks it's not a change for the better.

I noticed the appearance of NaN in some of Rel's results, where I had inadvertently been dividing a very large number by a very small number.  As it happens, those appearances did no harm at all and I was pleased not to have to write complicated extra code to avoid exceptions.  But dandl says SUMMARIZE PER can always give the "right" result.  I'm not sure that epithet can be justified for NaN in every case.

I previously referred to the use of UNION to give outer joins as involving ad hoc solutions (such as in my example using an RVA).  In some cases I would be happy to have zero appearing for the average of no numbers, such as when the given list is guaranteed to contain only positive numbers and the result is a "final" one that I just want to look at, so to speak.

By the way, I note that in Rel type RATIONAL includes NaN but I haven't seen a counterpart in type INTEGER.  CAST_AS_INTEGER(NaN) and ROUND(NaN) both give 0 (zero).  Is that a correct treatment?

Hugh

Coauthor of The Third Manifesto and related books.
Quote from Hugh on February 18, 2020, 12:24 pm
Quote from Dave Voorhis on February 17, 2020, 1:54 pm
Quote from Hugh on February 17, 2020, 12:43 pm

...  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.

In Rel, I'm thinking of changing AVG on an empty list to return the canonical floating point value NaN, or Not a Number, rather than throw exceptions as it does now (except when invoking AVG as, for example, AVG {}).

I think it's more ergonomic that way and equally correct, but I'm happy to be convinced otherwise if anyone thinks it's not a change for the better.

I noticed the appearance of NaN in some of Rel's results, where I had inadvertently been dividing a very large number by a very small number.  As it happens, those appearances did no harm at all and I was pleased not to have to write complicated extra code to avoid exceptions.  But dandl says SUMMARIZE PER can always give the "right" result.  I'm not sure that epithet can be justified for NaN in every case.

I previously referred to the use of UNION to give outer joins as involving ad hoc solutions (such as in my example using an RVA).  In some cases I would be happy to have zero appearing for the average of no numbers, such as when the given list is guaranteed to contain only positive numbers and the result is a "final" one that I just want to look at, so to speak.

By the way, I note that in Rel type RATIONAL includes NaN but I haven't seen a counterpart in type INTEGER.  CAST_AS_INTEGER(NaN) and ROUND(NaN) both give 0 (zero).  Is that a correct treatment?

The average of no numbers should only be NaN. Zero under any no-number circumstances would be incorrect.

NaN is a value in standard IEEE 754 floating point representation, upon which RATIONAL is based -- as are equivalents in most popular programming languages.

INTEGER is based on standard 32-bit (aka 'long') integers, which define no NaN. I suppose they could, but that would be a Rel-specific deviation from programming language integer convention.

CAST_AS_INTEGER(NaN) returning zero is a bit of a kludge. It's that or throw an exception.

ROUND(NaN) should return NaN, I think. It should be just a thin wrapper around a corresponding Java math routine. I'll check it when I get a moment later. It's probably a bug.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 18, 2020, 2:29 pm
Quote from Hugh on February 18, 2020, 12:24 pm
Quote from Dave Voorhis on February 17, 2020, 1:54 pm
Quote from Hugh on February 17, 2020, 12:43 pm

...  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.

In Rel, I'm thinking of changing AVG on an empty list to return the canonical floating point value NaN, or Not a Number, rather than throw exceptions as it does now (except when invoking AVG as, for example, AVG {}).

I think it's more ergonomic that way and equally correct, but I'm happy to be convinced otherwise if anyone thinks it's not a change for the better.

I noticed the appearance of NaN in some of Rel's results, where I had inadvertently been dividing a very large number by a very small number.  As it happens, those appearances did no harm at all and I was pleased not to have to write complicated extra code to avoid exceptions.  But dandl says SUMMARIZE PER can always give the "right" result.  I'm not sure that epithet can be justified for NaN in every case.

I previously referred to the use of UNION to give outer joins as involving ad hoc solutions (such as in my example using an RVA).  In some cases I would be happy to have zero appearing for the average of no numbers, such as when the given list is guaranteed to contain only positive numbers and the result is a "final" one that I just want to look at, so to speak.

By the way, I note that in Rel type RATIONAL includes NaN but I haven't seen a counterpart in type INTEGER.  CAST_AS_INTEGER(NaN) and ROUND(NaN) both give 0 (zero).  Is that a correct treatment?

The average of no numbers should only be NaN. Zero under any no-number circumstances would be incorrect.

NaN is a value in standard IEEE 754 floating point representation, upon which RATIONAL is based -- as are equivalents in most popular programming languages.

INTEGER is based on standard 32-bit (aka 'long') integers, which define no NaN. I suppose they could, but that would be a Rel-specific deviation from programming language integer convention.

CAST_AS_INTEGER(NaN) returning zero is a bit of a kludge. It's that or throw an exception.

ROUND(NaN) should return NaN, I think. It should be just a thin wrapper around a corresponding Java math routine. I'll check it when I get a moment later. It's probably a bug.

I remember posting on the old forum a comparison between the behaviour of NaN in numerics and the behaviour of NULL in SQL.  (Well to be honest I don't recall exactly because it also might have been about IEEE infinity and not IEEE NaN.  But that don't matter because even if it (that old post) was "only" about infinity (and in my remembrance the similarities were blatantly obvious) then I'm sure the comparison would hold even more for NaN.  NaN <anyop> <anynumbernotnan> = NaN.  It's the ultimate swallower of real information.  What more needs be said ?

Frankly, I'm seriously disappointed to see Hugh of all people kind of admit the practical value of NULL NaN.  Or maybe it's a matter of age and mine isn't sufficient to claim the stage of true wisdom yet ...  Who will tell ?

As for the casting to INTEGERS, I remain convinced NaN should be cause for exception here, and possibly the more pragmatic people would claim it is reasonable to cast +INF to +2147654321 and -INF to -2147654321 and in that particular case I might even be reluctant to disagree because despite everything anyone might believe I'm not that eager to discard the practical value of being pragmatic, but still I'd wonder what those same "more pragmatic people" are going to propose as a solution for a cast to the domain of ***unsigned*** integers ...  +INF to +4295654321 and -INF to ... eurhm ... 0 ?

(PS the 654321 trailing figures are just because I couldn't be bothered to lookup the real 2^31 and 2^32 values.)

Quote from Erwin on February 18, 2020, 7:22 pm

As for the casting to INTEGERS, I remain convinced NaN should be cause for exception here, and possibly the more pragmatic people would claim it is reasonable to cast +INF to +2147654321 and -INF to -2147654321 and in that particular case I might even be reluctant to disagree because despite everything anyone might believe I'm not that eager to discard the practical value of being pragmatic, but still I'd wonder what those same "more pragmatic people" are going to propose as a solution for a cast to the domain of ***unsigned*** integers ...  +INF to +4295654321 and -INF to ... eurhm ... 0 ?

I guess what I was trying to point out is that examples (signed integers vs unsigned integers) tend to get cherrypicked based upon their particular case being favourable to the "solution" that was preferred to begin with.  Or maybe not even "preferred" but just "implemented by happenstance".

Quote from Erwin on February 18, 2020, 7:22 pm
Quote from Dave Voorhis on February 18, 2020, 2:29 pm
Quote from Hugh on February 18, 2020, 12:24 pm
Quote from Dave Voorhis on February 17, 2020, 1:54 pm
Quote from Hugh on February 17, 2020, 12:43 pm

...  In my book, AVG does the "right thing" to a nonempty list of numbers but not to the empty list because it will raise an exception.

In Rel, I'm thinking of changing AVG on an empty list to return the canonical floating point value NaN, or Not a Number, rather than throw exceptions as it does now (except when invoking AVG as, for example, AVG {}).

I think it's more ergonomic that way and equally correct, but I'm happy to be convinced otherwise if anyone thinks it's not a change for the better.

I noticed the appearance of NaN in some of Rel's results, where I had inadvertently been dividing a very large number by a very small number.  As it happens, those appearances did no harm at all and I was pleased not to have to write complicated extra code to avoid exceptions.  But dandl says SUMMARIZE PER can always give the "right" result.  I'm not sure that epithet can be justified for NaN in every case.

I previously referred to the use of UNION to give outer joins as involving ad hoc solutions (such as in my example using an RVA).  In some cases I would be happy to have zero appearing for the average of no numbers, such as when the given list is guaranteed to contain only positive numbers and the result is a "final" one that I just want to look at, so to speak.

By the way, I note that in Rel type RATIONAL includes NaN but I haven't seen a counterpart in type INTEGER.  CAST_AS_INTEGER(NaN) and ROUND(NaN) both give 0 (zero).  Is that a correct treatment?

The average of no numbers should only be NaN. Zero under any no-number circumstances would be incorrect.

NaN is a value in standard IEEE 754 floating point representation, upon which RATIONAL is based -- as are equivalents in most popular programming languages.

INTEGER is based on standard 32-bit (aka 'long') integers, which define no NaN. I suppose they could, but that would be a Rel-specific deviation from programming language integer convention.

CAST_AS_INTEGER(NaN) returning zero is a bit of a kludge. It's that or throw an exception.

ROUND(NaN) should return NaN, I think. It should be just a thin wrapper around a corresponding Java math routine. I'll check it when I get a moment later. It's probably a bug.

I remember posting on the old forum a comparison between the behaviour of NaN in numerics and the behaviour of NULL in SQL.  (Well to be honest I don't recall exactly because it also might have been about IEEE infinity and not IEEE NaN.  But that don't matter because even if it (that old post) was "only" about infinity (and in my remembrance the similarities were blatantly obvious) then I'm sure the comparison would hold even more for NaN.  NaN <anyop> <anynumbernotnan> = NaN.  It's the ultimate swallower of real information.  What more needs be said ?

Frankly, I'm seriously disappointed to see Hugh of all people kind of admit the practical value of NULL NaN.  Or maybe it's a matter of age and mine isn't sufficient to claim the stage of true wisdom yet ...  Who will tell ?

As for the casting to INTEGERS, I remain convinced NaN should be cause for exception here, and possibly the more pragmatic people would claim it is reasonable to cast +INF to +2147654321 and -INF to -2147654321 and in that particular case I might even be reluctant to disagree because despite everything anyone might believe I'm not that eager to discard the practical value of being pragmatic, but still I'd wonder what those same "more pragmatic people" are going to propose as a solution for a cast to the domain of ***unsigned*** integers ...  +INF to +4295654321 and -INF to ... eurhm ... 0 ?

(PS the 654321 trailing figures are just because I couldn't be bothered to lookup the real 2^31 and 2^32 values.)

NaN's not null, because NaN isn't a universal pseudo-value that generates illogic wherever it's used.

NaN's simply a value in the IEEE 754 union type consisting of Nan | +Inf | -Inf | -FLOAT_MIN .. 0 .. +FLOAT_MAX.

For better, or worse, it's standard floating point. Those who don't like standard floating point are, of course, free to use -- or create -- alternative types.

As for type casting, there's no way to make everyone happy, given what it's doing. Whatever you do, you're either going to be throwing a gaggle of annoying exceptions and irritating the pragmatists or doing arbitrary conversions and annoying the purists.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 18, 2020, 8:18 pm

NaN's not null, because NaN isn't a universal pseudo-value that generates illogic wherever it's used.

NaN's simply a value in the IEEE 754 union type consisting of Nan | +Inf | -Inf | -FLOAT_MIN .. 0 .. +FLOAT_MAX.

For better, or worse, it's standard floating point. Those who don't like standard floating point are, of course, free to use -- or create -- alternative types.

As for type casting, there's no way to make everyone happy, given what it's doing. Whatever you do, you're either going to be throwing a gaggle of annoying exceptions and irritating the pragmatists or doing arbitrary conversions and annoying the purists.

I wondered whether NaN compares equal to itself, and how it behaves in ordering comparisons.

The very first thing I ran into on my very brief google search was this :

https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values

The following excerpts from it I found particularly, eurhm well, I'm afraid I don't really have the word to describe it :

"That is, all comparisons with the operators ==, <=, >=, <, > where one or both values is NaN returns false, contrary to the behaviour of all other values."

NaN does not compare equal to itself, so any such type does not obey the TTM behaviour-of-types prescriptions.  NaN < anynumber is false, NaN = anynumber is false, NaN > anynumber is false.  Why does all that read like something I've seen elsewhere before ?

"[The answers so far all argue that it is meaningless to compare NaNs.]  I agree, but that doesn't mean that the correct answer is false, rather it would be a Not-a-Boolean (NaB), which fortunately doesn't exist."

(Trying to get back into chair now.)

Quote from Erwin on February 19, 2020, 11:01 am
Quote from Dave Voorhis on February 18, 2020, 8:18 pm

NaN's not null, because NaN isn't a universal pseudo-value that generates illogic wherever it's used.

NaN's simply a value in the IEEE 754 union type consisting of Nan | +Inf | -Inf | -FLOAT_MIN .. 0 .. +FLOAT_MAX.

For better, or worse, it's standard floating point. Those who don't like standard floating point are, of course, free to use -- or create -- alternative types.

As for type casting, there's no way to make everyone happy, given what it's doing. Whatever you do, you're either going to be throwing a gaggle of annoying exceptions and irritating the pragmatists or doing arbitrary conversions and annoying the purists.

I wondered whether NaN compares equal to itself, and how it behaves in ordering comparisons.

The very first thing I ran into on my very brief google search was this :

https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values

The following excerpts from it I found particularly, eurhm well, I'm afraid I don't really have the word to describe it :

"That is, all comparisons with the operators ==, <=, >=, <, > where one or both values is NaN returns false, contrary to the behaviour of all other values."

NaN does not compare equal to itself, so any such type does not obey the TTM behaviour-of-types prescriptions.  NaN < anynumber is false, NaN = anynumber is false, NaN > anynumber is false.  Why does all that read like something I've seen elsewhere before ?

"[The answers so far all argue that it is meaningless to compare NaNs.]  I agree, but that doesn't mean that the correct answer is false, rather it would be a Not-a-Boolean (NaB), which fortunately doesn't exist."

(Trying to get back into chair now.)

Well, yes.

IEEE 754 is what it is. I can either use it -- which means putting up with how it works (or doesn't, depending on your point of view) -- or not use it, which means writing a numeric library from scratch. I'd rather not do that -- it's a thankless task -- so in this case, utility trumps purity.

I once considered offering RATIONAL as the built-in type with almost no built-in numeric operators (because I don't want to write them) and adding FLOAT as the canonical IEEE 754 type, but I knew what would happen: users would grumble about RATIONAL not having FLOAT's functionality and just use FLOAT. Lesson plans would be full of student-annoying notes like, "Wherever you see RATIONAL in the texts, write FLOAT. The sample databases and source code on U:\CS\CS752\AY2020\Exercises\TD already have the conversion done for you. Sorry for any inconvenience."

Nobody would appreciate the lesson.

In earlier versions of Rel, I elided NaN but it inevitably kept appearing in one way or another. You can either keep throwing unproductively annoying exceptions or give in and allow it. I allowed it.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

IMO the native float binary64 type is something to be avoided in the core of a D. The only way it can used in compliance with TTM is to define the 'set of values' as only those that are well-behaved, and raise an exception whenever any of the weird values turn up. I almost never use binary float, and when I have to I certainly don't appreciate the complications it adds.

I think it's reasonable to have a counting number type and a quantity number type, but they should offer a lot more than 15 digits of precision. I would probably go with 64-bit integer and Java BigDecimal (or equivalent) by default, but allow infinite precision as an option. The libraries are all out there now.

Andl - A New Database Language - andl.org
Quote from dandl on February 19, 2020, 12:45 pm

IMO the native float binary64 type is something to be avoided in the core of a D. The only way it can used in compliance with TTM is to define the 'set of values' as only those that are well-behaved, and raise an exception whenever any of the weird values turn up. I almost never use binary float, and when I have to I certainly don't appreciate the complications it adds.

I use a Rel-typeified Java BigDecimal for my personal databases involving money, but RATIONAL implemented as the standard float is handy for porting standard C, Java, C#, you-name-it code examples that use float or double, as many do.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org