The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

SUMMARIZE PER, OUTER JOIN and image relations

Quote from dandl on February 23, 2020, 11:53 pm

I'm happy to violate this in the form of changing or removing existing operators and types in a teaching tool. So I might, if appropriate, do so in Rel. The resulting conversion pain for those of us who use it in production -- and/or for teaching -- will be relatively brief but the end result will be improvement for us and others.

I appreciate that conversion processes are annoying -- and for that I apologise in advance -- but I don't think any of us who are using Rel for personal production purposes are going to have to insert control rods in the reactors and shut down the power station. At worst, an afternoon or two of gentle conversion effort, if even that.

Rel is arguably a toy language, in perpetual beta, and a bit of instability it to be expected. There are many pieces of code out there that lie somewhere further along the stability/maturity spectrum, with many dependencies and significant consequences if it changes. I have a fairly substantial piece of Ruby on Rails which I can no longer get to work, because features I depended on have been deprecated and then removed. Maybe I shouldn't have done it that way, but the RoR team killed my project.

Some would say RoR is a toy, but this sort of thing happens in industrial software, too. On my current contract, I'm working on a small tool for populating a particular database with test data. It hadn't been touched for almost a decade and some of the Java libraries it used have subsequently been deprecated (they might even have been deprecated when the tool was built), then really deprecated, then -- finally -- with Java 11, removed from the JDK. Now I'm replacing them with alternatives. No big deal -- I'll have it done by the end of today.

But for commercial or commercial-grade production tools, absolutely not. The semantics of every operator must be static, self-contained (per above), permanent, and perpetual, unless it can be proven that fixing broken functionality cannot possibly break production systems. Otherwise, fixes and functionality may only be provisioned by adding operators and associated types without changing or removing existing ones.

Modal behaviour, whether programmatically alterable or only via a system configuration -- which implies operator semantics that are externally determined and defined by more than parameters and the operator itself -- is categorically forbidden. It's too wrong to be allowed in any form.

Then you rule out one strategy for dealing with serious bugs and mistakes. Assume you have a language product that depends on a library that has traditionally been part of the core, something like maths, date/time, regex, crypto, something important and hard. The library has had known problems: bugs, precision, arbitrary limits, hackable, whatever.  The old library is no longer maintained/maintainable, and it is proposed to replace the library by a new one that is API compatible, but inevitably has small differences in semantics. All new users want the new improved library, all existing production users want their code to keep working as is without fail. Getting locked into an old revision of the product is not an option, because of updates in other parts of the product.

My strategy is to make the library a user choice, and the same code may work slightly differently depending on which is chosen. Your strategy, please?

The standard approach, and the one I advocate, is:

  1. If a fix (not a change) can be introduced without changing semantics and therefore normally without breaking existing code, release a newly-numbered version of an existing library that employs the same API but includes the fix. For example, a security fix: library routine foo(p, q) now has a buffer overrun trap where previously it could overrun into user memory under pathological conditions. Under normal circumstances, the fix should be imperceptible. User developers should be instructed to migrate as soon as possible to the new library, e.g., from stdlib-release-1.0.0.jar to stdlib-release.1.0.1.jar  In some environments, it may be possible to indicate to user developers that stdlib-release-1.0.0.jar is deprecated.
  2. If a change cannot be introduced without possibly (or definitely) breaking existing code -- i.e., a semantic change -- release a new, distinct API. The broken API continues to be provided, but is flagged as deprecated and eventually withdrawn. The @deprecated annotation in Java is used for precisely this, to indicate classes or methods that may be used but should be avoided, and that will eventually (usually years later) be removed.
I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Then you rule out one strategy for dealing with serious bugs and mistakes. Assume you have a language product that depends on a library that has traditionally been part of the core, something like maths, date/time, regex, crypto, something important and hard. The library has had known problems: bugs, precision, arbitrary limits, hackable, whatever.  The old library is no longer maintained/maintainable, and it is proposed to replace the library by a new one that is API compatible, but inevitably has small differences in semantics. All new users want the new improved library, all existing production users want their code to keep working as is without fail. Getting locked into an old revision of the product is not an option, because of updates in other parts of the product.

My strategy is to make the library a user choice, and the same code may work slightly differently depending on which is chosen. Your strategy, please?

The standard approach, and the one I advocate, is:

  1. If a fix (not a change) can be introduced without changing semantics and therefore normally without breaking existing code, release a newly-numbered version of an existing library that employs the same API but includes the fix. For example, a security fix: library routine foo(p, q) now has a buffer overrun trap where previously it could overrun into user memory under pathological conditions. Under normal circumstances, the fix should be imperceptible. User developers should be instructed to migrate as soon as possible to the new library, e.g., from stdlib-release-1.0.0.jar to stdlib-release.1.0.1.jar  In some environments, it may be possible to indicate to user developers that stdlib-release-1.0.0.jar is deprecated.
  2. If a change cannot be introduced without possibly (or definitely) breaking existing code -- i.e., a semantic change -- release a new, distinct API. The broken API continues to be provided, but is flagged as deprecated and eventually withdrawn. The @deprecated annotation in Java is used for precisely this, to indicate classes or methods that may be used but should be avoided, and that will eventually (usually years later) be removed.

You've obviously thought about this a lot, but your strategy still only covers part of the problem, bits you've run into, skirting around the bits that I say justify a different strategy.

I'm talking about:

  • Core language features and/or libraries, not semi-optional add-ons. If the date/time libraries built into the core product have a leap year bug (like Excel) you cannot fix it by issuing a new API. Ditto for regex, crypto, maths, collections, streams, etc, etc. Core features, included in the standard release, widely used.
  • Not a simple buffer overrun (although any new trap can still kill old code) but a subtle difference like an RNG or loss of precision, or an outright bug that produces wrong (but expected) results.
  • Languages being used by people you don't even know for purposes you cannot even guess at, who have a large old code body they badly need to just keep working regardless. They cannot migrate old code, and they want to use new features.

We've been living with this for years. Now we never introduce a feature or fix a bug that may change behaviour of existing code without providing a configuration option to allow users to keep their old code working. There is no realistic choice.

Andl - A New Database Language - andl.org
Quote from dandl on February 25, 2020, 2:48 am

Then you rule out one strategy for dealing with serious bugs and mistakes. Assume you have a language product that depends on a library that has traditionally been part of the core, something like maths, date/time, regex, crypto, something important and hard. The library has had known problems: bugs, precision, arbitrary limits, hackable, whatever.  The old library is no longer maintained/maintainable, and it is proposed to replace the library by a new one that is API compatible, but inevitably has small differences in semantics. All new users want the new improved library, all existing production users want their code to keep working as is without fail. Getting locked into an old revision of the product is not an option, because of updates in other parts of the product.

My strategy is to make the library a user choice, and the same code may work slightly differently depending on which is chosen. Your strategy, please?

The standard approach, and the one I advocate, is:

  1. If a fix (not a change) can be introduced without changing semantics and therefore normally without breaking existing code, release a newly-numbered version of an existing library that employs the same API but includes the fix. For example, a security fix: library routine foo(p, q) now has a buffer overrun trap where previously it could overrun into user memory under pathological conditions. Under normal circumstances, the fix should be imperceptible. User developers should be instructed to migrate as soon as possible to the new library, e.g., from stdlib-release-1.0.0.jar to stdlib-release.1.0.1.jar  In some environments, it may be possible to indicate to user developers that stdlib-release-1.0.0.jar is deprecated.
  2. If a change cannot be introduced without possibly (or definitely) breaking existing code -- i.e., a semantic change -- release a new, distinct API. The broken API continues to be provided, but is flagged as deprecated and eventually withdrawn. The @deprecated annotation in Java is used for precisely this, to indicate classes or methods that may be used but should be avoided, and that will eventually (usually years later) be removed.

You've obviously thought about this a lot, but your strategy still only covers part of the problem, bits you've run into, skirting around the bits that I say justify a different strategy.

I'm talking about:

  • Core language features and/or libraries, not semi-optional add-ons. If the date/time libraries built into the core product have a leap year bug (like Excel) you cannot fix it by issuing a new API. Ditto for regex, crypto, maths, collections, streams, etc, etc. Core features, included in the standard release, widely used.
  • Not a simple buffer overrun (although any new trap can still kill old code) but a subtle difference like an RNG or loss of precision, or an outright bug that produces wrong (but expected) results.
  • Languages being used by people you don't even know for purposes you cannot even guess at, who have a large old code body they badly need to just keep working regardless. They cannot migrate old code, and they want to use new features.

We've been living with this for years. Now we never introduce a feature or fix a bug that may change behaviour of existing code without providing a configuration option to allow users to keep their old code working. There is no realistic choice.

Your two bullet points justify precisely the two approaches I listed -- plus a possible third (which is arguably a variation on the second), the one taken by Python when going from version 2 to version 3: Release a whole new product inspired by the original, but clearly distinct and incompatible.

Applied computer science has trodden this ground well, and determined that these are the least harmful ways to handle upgrading, patching and fixing products (particularly language products) without introducing worse problems.

Feature flags, in whatever guise, when they wind up in user space cause more problems than they solve.

As for a "large old code body they badly need to just keep working regardless", this is equivalent to what I dealt with supporting a large domain-specific office automation product in the 90's: There was a small set of sites that refused to upgrade to new versions. During its lifetime, we released multiple DOS (actually, DOS Extended Mode) versions followed by multiple Windows versions. The majority of users delightedly moved to the Windows product as soon as it was released. A few holdouts steadfastly refused to migrate, but were adamant that new functionality from the Windows version be backported to the last DOS version, or -- in at least one case -- to an early DOS version because they refused to upgrade to later ones.

Aside from meeting statutory legal requirements by providing updates to a payroll deduction calculation module, we categorically refused. The Windows version is the way to get updated functionality; you can't have it both ways.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Hugh on February 21, 2020, 1:00 pm

(NaN = NaN) = TRUE is very good.  It means one can use a WHERE clause to discover or eliminate tuples with NaN, with good knock-on effects for JOIN, MATCHING, etc.   With Rel  one can't do that.

The IEEE model is to have an operator IS_NAN whose value is boolean and can be used to detect NaNs.

I wonder what the IEEE committee thought went wrong with (NaN = NaN) = TRUE.

That requires a fair amount of explanation.  (I write 1 for an exact rational, 1.0 for a floating-point number.)

Floating-point numbers can be mapped onto intervals of exact rational numbers.  (An earlier version of this remark added that float arithmetic is interval arithmetic, but that was an error on my part.)  For example, the immediate representable neighbors of 1.0 (a float) are b = 0.99999999999999989 below and a = 1.0000000000000002 above.  So the interval corresponding to 1.0 runs from halfway between b and 1 (an exact rational) to halfway between 1 and a.

There are five special cases corresponding to special kinds of intervals.  The interval around 0 is split in two: the non-negative values correspond to 0.0 and the negative values correspond to -0.0.  These floats are equal but have different signs.  +Inf corresponds to a interval running from just above the largest representable number up to (affine) positive infinity, and -Inf corresponds to a similar interval at the other end.

Finally, NaN is ambiguous because it can correspond to either of two intervals.  5.0/0.0 = NaN because any number times 0 equals 0, so the corresponding interval is the universal one from negative infinity to positive infinity.  But 5.0/0.0 = NaN because no number times 0 equals 5, so the corresponding interval is empty.

Because of the ambiguity, and the vagueness of the universal interval, it would be misleading to treat two NaNs as equal, or to treat NaN as orderable with any float including a NaN.  Since IEEE's logic is two-valued, all of =, <, ≤, >, and ≥ return false if either argument is NaN, and their negations ≠, ≮, ≰, ≯, and ≱ all return true.  Unfortunately this means that neither total ordering nor the trichotomy law survive, but we can't have everything.

Quote from Dave Voorhis on February 24, 2020, 11:03 am
  1. If a fix (not a change) can be introduced without changing semantics

I don't even know what that means.  If the semantics are the same as before, it is not a fix, it is at best a refactor or a performance improvement.  Any other change does potentially break existing code.

  1. For example, a security fix: library routine foo(p, q) now has a buffer overrun trap where previously it could overrun into user memory under pathological conditions. Under normal circumstances, the fix should be imperceptible.

Unless of course the user depends on the precise nature of that overrun.  In order for "fix" to be a meaningful concept, there must be an agreed standard other than the code that specifies what should happen.  Then if it does not happen, the code (possibly on both sides) must be changed to conform.

There are people who hold that it is unprofessional to use any language that does not have a third party standard for precisely this reason.  Unfortunately, most of us so-called professionals have no more choice about what languages we use than a ditchdigger has about what shovel he uses.

 

Quote from johnwcowan on March 2, 2020, 2:00 am
Quote from Dave Voorhis on February 24, 2020, 11:03 am
  1. If a fix (not a change) can be introduced without changing semantics

I don't even know what that means.  If the semantics are the same as before, it is not a fix, it is at best a refactor or a performance improvement.  Any other change does potentially break existing code.

Assume we have an API function with fully specified behaviour (say RNG that is guaranteed to repeat after N cycles). The software is published, in use by customers, and then a bug is discovered: (say the RNG actually it repeats after N/2 cycles). We would like to fix the bug so it conforms to the spec,and we expect (hope) no-one to be affected.  But when we release a beta version with this change in it, we get an issue raised by a customer saying it broke their software, they don't have anyone with the skills to fix it, and would we please put it back the way it was.

The published semantics are the same. It's a bug, so we ought to fix it. It will potentially break existing code (that relies on the bug). Your move.

 

Andl - A New Database Language - andl.org
Quote from dandl on March 2, 2020, 2:59 am

Assume we have an API function with fully specified behaviour (say RNG that is guaranteed to repeat after N cycles). The software is published, in use by customers, and then a bug is discovered: (say the RNG actually it repeats after N/2 cycles). We would like to fix the bug so it conforms to the spec,and we expect (hope) no-one to be affected.  But when we release a beta version with this change in it, we get an issue raised by a customer saying it broke their software, they don't have anyone with the skills to fix it, and would we please put it back the way it was.

The published semantics are the same. It's a bug, so we ought to fix it. It will potentially break existing code (that relies on the bug). Your move.

Fix it.  If the customer moans, tell them they should have reported your failure to meet the spec in the first place.  If they continue to moan, cut your losses: some customers cost you far more than they are worth, not just in money but in engineering time and in tsuris.

If some huge fraction of your income depends on this customer, broaden your customer base as quickly as possible.  At a former employer, there was a list on the wall of the 20 customers (out of about 90,000 customers worldwide) who provided some huge fraction of corporate revenue.  I thought that was a damned embarrassment, something to fix, not to brag to (or warn) employees about.

 

Quote from dandl on March 2, 2020, 2:59 am
Quote from johnwcowan on March 2, 2020, 2:00 am
Quote from Dave Voorhis on February 24, 2020, 11:03 am
  1. If a fix (not a change) can be introduced without changing semantics

I don't even know what that means.  If the semantics are the same as before, it is not a fix, it is at best a refactor or a performance improvement.  Any other change does potentially break existing code.

Assume we have an API function with fully specified behaviour (say RNG that is guaranteed to repeat after N cycles). The software is published, in use by customers, and then a bug is discovered: (say the RNG actually it repeats after N/2 cycles). We would like to fix the bug so it conforms to the spec,and we expect (hope) no-one to be affected.  But when we release a beta version with this change in it, we get an issue raised by a customer saying it broke their software, they don't have anyone with the skills to fix it, and would we please put it back the way it was.

The published semantics are the same. It's a bug, so we ought to fix it. It will potentially break existing code (that relies on the bug). Your move.

The customer needs to hire the skills to update to the new version of the product with the old RNG -- same old buggy behaviour as before -- and the new RNG2 with the fix, and switch to RNG2. This is the accepted approach, recognised by industry because it's the approach that works.

Arguments in favour of various forms of well-known (and frequently-tried, invariably with unpleasant results) bad ideas are not sustainable.

I'm sure there are exceptional cases that warrant exceptional fixes. They can be ignored because they're exceptional. We are talking about best practices here, aren't we? Not kludges?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org