The Forum for Discussion about The Third Manifesto and Related Matters

You need to log in to create posts and topics.

TTM's IM: a 'solution' to a question nobody asked; then how else to express squares are rectangles?

12
Quote from Dave Voorhis on October 25, 2019, 11:06 am
Quote from Erwin on October 25, 2019, 10:36 am

(I also think the argument about "having to compute MST all the time" is flawed, but never have been able to show where&how, so I also think the "leads to bad performance of necessity" pundits might have a valid point even if they can't prove it.)

The MST shouldn't have to be computed any more often than computing the parameter type(s) and best invocation target has to be done in any multiple-dispatch system, since that's what it is. I.e, it only has to be computed when the invocation target (i.e., the specific operator implementation to call) is otherwise ambiguous. Alternatively, it can be computed once when a value is selected.

Sadly, not true for relations. The MST has to be recomputed whenever a new relation value is created, such as when a relvar is updated. You can optimise insertions to a point, but for deletions the MST has to be computed by examining every tuple. That's expensive! There are lots of things you can do without needing the MST, but for some you do and picking the right overloaded/multi-dispatched function is one of them.

 

Andl - A New Database Language - andl.org
Quote from dandl on October 26, 2019, 9:32 am
Quote from Dave Voorhis on October 25, 2019, 11:06 am
Quote from Erwin on October 25, 2019, 10:36 am

(I also think the argument about "having to compute MST all the time" is flawed, but never have been able to show where&how, so I also think the "leads to bad performance of necessity" pundits might have a valid point even if they can't prove it.)

The MST shouldn't have to be computed any more often than computing the parameter type(s) and best invocation target has to be done in any multiple-dispatch system, since that's what it is. I.e, it only has to be computed when the invocation target (i.e., the specific operator implementation to call) is otherwise ambiguous. Alternatively, it can be computed once when a value is selected.

Sadly, not true for relations. The MST has to be recomputed whenever a new relation value is created, such as when a relvar is updated. You can optimise insertions to a point, but for deletions the MST has to be computed by examining every tuple. That's expensive! There are lots of things you can do without needing the MST, but for some you do and picking the right overloaded/multi-dispatched function is one of them.

A relation of n tuples can mean an operation on that relation may require ≤ n MST computations, but that's notionally true of any collection of n items in any language. One item means computing something once; n items may mean computing it up to and including n times...

Unless I'm missing something unstated here (which is entirely possible, even likely -- I haven't had coffee yet) updates shouldn't require recomputing the MST on every tuple, only on the tuples whose attribute values are passed as an argument to an overloaded operator, and that's optimisable by caching value-invocation pairs at the point of invocation. A relvar heading doesn't change due to an update; it's always its declared type.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Erwin on October 25, 2019, 1:11 pm

I'm not wanting to 'pick on' Erwin's reply specifically, but it gives me more material to riff on.

All possreps ***are*** common between any type and any of its subtypes (all possreps of the supertype at hand, that is, of course).

Suppose our 'domain of discourse' is not just rectangles/squares, polygons but also ellipses/circles. What PossRep would you like to choose that is common amongst them? From what I can see in Date's writing, the only answer is to form a IM UNION type between the polygons and curves. But UNION types still don't have a PossRep in common -- which lack is the typical reason for needing UNIONs.

I can offer a PossRep after a fashion: SuperEllipses aka Lamé curves. Curiously it can describe ellipses, circles, squares, rhombuses, but not non-rhombus parallelograms, nor non-square rectangles. (To be precise, you get rectangles in the limit as the exponent goes to infinity. I'm not sure I want that in my database.)

My observation here is contra Date's "intrinsic" as in "a certain constraint, ... by examining just those properties that are intrinsic to values of type T in general." [page numbered 5]:

There's nothing "intrinsic" to the objects we're trying to describe. We choose some representation sufficient for the purposes in the database.

 

Bringing SQUARE into the RECTANGLE hierarchy is done exactly by writing TYPE SQUARE IS RECTANGLE CONSTRAINT the_length(...)=the_width(...).

Sure. Because in our analysis of the 'business domain' for this database/schema, it's useful to treat squares as a form of rectangle. Please now show the declarations for treating ellipses/circles together with rectangles/squares as forms of figures of the plane.

Now SQUARE has the length,width possrep (or slightly more precisely it has all the operators that come with that possrep).  There's no "assuming it into existence" because types (such as RECTANGLE) have possreps, per the prescriptions of TTM sans IM.  No "weasel words" needed to "suggest" any such thing.

Those are all analysis/design decisions. There's nothing "intrinsic". And for every such design decision, I'm pretty sure I could find properties that apply for the supertype but not the subtype.

  • For rectangles/squares, look at these definitions for portrait (adj) vs landscape (adj). Rectangles have those properties; squares do not.
  • Ellipses have major and minor semi-axes; circles do not.

So I want to say having "all the operators" is not an intrinsic property, it's an emergent property from our choice of representation. The representation is not forced to have a type RECTANGLE with components length, width. Some representation exhibits rectangularity just in case we can declare/implement methods length, width for its type.

Then from the point of view of programming ergonomics: it's more flexible to add emergent properties as overloadings for a method after defining the datatype. (And if we did define those properties as intrinsic to the datatype, adding the overloading is trivial.)

It feels slightly like you are thinking of physical representation,

No I'm thinking of a non-redundant representation that is not prone to update anomalies. If I know some entity in my database has the property it is necessarily square, I should represent its side as a single component of the data structure. If I further know (or later learn) I'll need to treat its shape together with rectangles, I overload the rectangle properties to treat the squares as well. (You might say I could represent the squares as rectangles subject to a constraint so there's no danger of update anomalies; I could build an API or view or PossRep so I only need provide a single component for side vs two for length, width; there'd be constant chafing against Occam's razor.)

and indeed if you ***want*** a physical possrep for SQUARE holding only one 'sideLength' value, then additional machinery is needed to make RECTANGLE's required THE_WIDTH() and THE_LENGTH() available again (contractual obligation prescribed by the model).  I believe TIRT does discuss that somewhere.  But at any rate such considerations are beyond the purview of the model per se.

And I believe I've written at least once on the old forum that ***if*** you make INT a subtype of RAT then you do "inherit" a division operator that takes two INTs and returns a RAT.  If you want the traditional integer division besides that, than that's ***another*** operator needing its own signature.

No that's another design decision: does the division operator always return RAT; vs does it always return a type same as its arguments (as do plus, minus, multiply typically); (vs is it not overloaded for INT at all)? Yes you might have other operators for INT division.

 

Quote from AntC on October 27, 2019, 7:11 am
Quote from Erwin on October 25, 2019, 1:11 pm

My observation here is contra Date's "intrinsic" as in "a certain constraint, ... by examining just those properties that are intrinsic to values of type T in general." [page numbered 5]:

There's nothing "intrinsic" to the objects we're trying to describe. We choose some representation sufficient for the purposes in the database.

 

Now SQUARE has the length,width possrep (or slightly more precisely it has all the operators that come with that possrep).  There's no "assuming it into existence" because types (such as RECTANGLE) have possreps, per the prescriptions of TTM sans IM.  No "weasel words" needed to "suggest" any such thing.

Those are all analysis/design decisions. There's nothing "intrinsic". And for every such design decision, I'm pretty sure I could find properties that apply for the supertype but not the subtype.

...

So I want to say having "all the operators" is not an intrinsic property, it's an emergent property from our choice of representation. The representation is not forced to have a type RECTANGLE with components length, width. Some representation exhibits rectangularity just in case we can declare/implement methods length, width for its type.

Then from the point of view of programming ergonomics: it's more flexible to add emergent properties as overloadings for a method after defining the datatype. (And if we did define those properties as intrinsic to the datatype, adding the overloading is trivial.)

Oh, I should point out that the OOP practice of 'bundling' methods/overloadings inside classes/instances is again forcing a property to be intrinsic to the class. TTM is correct to criticise that; I'm criticising the IM on pretty much the same grounds: don't mix up data descriptions with behaviour. (Wanting something that's essentially square to behave sometimes rectangularly.)

Quote from AntC on October 27, 2019, 7:11 am

I can offer a PossRep after a fashion: SuperEllipses aka Lamé curves. Curiously it can describe ellipses, circles, squares, rhombuses, but not non-rhombus parallelograms, nor non-square rectangles. (To be precise, you get rectangles in the limit as the exponent goes to infinity. I'm not sure I want that in my database.)

[...]

Please now show the declarations for treating ellipses/circles together with rectangles/squares as forms of figures of the plane.

The WP article you cite pretty much lays it out already.

  • For rectangles/squares, look at these definitions for portrait (adj) vs landscape (adj). Rectangles have those properties; squares do not.
  • Ellipses have major and minor semi-axes; circles do not.

Sure they do, it's just that they are equal.  You might as well say that the square root relation doesn't apply to all the complex numbers, because 0.0 doesn't have two roots but only one.  It has two roots that happen to be equal.

So I want to say having "all the operators" is not an intrinsic property, it's an emergent property from our choice of representation. The representation is not forced to have a type RECTANGLE with components length, width.

The biter bit: this confuses substitutional quantification with objectual quantification, or the representation with the represented, just as you like.  (What you said I was doing.)   The representation can be a list of black pixels for all it matters to anyone.

Some representation exhibits rectangularity just in case we can declare/implement methods length, width for its type.

Just so.

Then from the point of view of programming ergonomics: it's more flexible to add emergent properties as overloadings for a method after defining the datatype. (And if we did define those properties as intrinsic to the datatype, adding the overloading is trivial.)

This distinction between emergent and intrinsic properties is like the distinction between primary and secondary keys:  es machts nichts.

No I'm thinking of a non-redundant representation that is not prone to update anomalies.

Update anomalies apply only where there is update, and values can't be updated.  You can no more change a rectangle to a square than you can change a leopard to a tiger.

If I know some entity in my database has the property it is necessarily square, I should represent its side as a single component of the data structure.

Where does the "should" come in?  You can represent it any way you choose.  Nobody needs to know how.  For that matter, you can have two rectangle constructors: one that constructs square rectangles and another that constructs non-square ones, if you want to.

No that's another design decision: does the division operator always return RAT; vs does it always return a type same as its arguments (as do plus, minus, multiply typically); (vs is it not overloaded for INT at all)? Yes you might have other operators for INT division.

What is "the" division operator?  There are many division operators.  Integral division accepts two integers, a numerator and a denominator, and returns two, a quotient and a remainder.  To deserve the name division, it should satisfy the identity dq + r = n.  But there are a great many such operators; at least six that return both values, and add twelve more that discard either the quotient or the remainder.

Quote from AntC on October 27, 2019, 7:33 am

Wanting something that's essentially square to behave sometimes rectangularly.)

Every value that is square can behave rectangularly, for every square is a rectangle.  A place that can only be occupied by squares may or may not be able to contain rectangles.

Quote from AntC on October 25, 2019, 5:52 am

... a 2-part post covering the two parts of the title. The second part is a response to Dave's request for elaboration of my sketch for a 'domain'-based type system. I wanted first to motivate why I see TTM's IM as having missed the point about squares/rectangles/etc.

Here's part 2. [Oh, that turned out easier to write than I expected. Done.]

Dave's request (From the Codd 1970 'domain' thread)

Quote from Dave Voorhis on October 23, 2019, 8:50 am
Quote from AntC on October 23, 2019, 6:29 am

 

You might have to invent them. Most of computer science and engineering is a result of someone being dissatisfied with X's treatment of Y.

OK. Starter for 10 ... I'll begin at the end and work back to how I got there. (This is contra the quote at Ch 1 of TIRT.)

  • ...
  • There's a form of inheritance (not shown) in those supported operations: the decl for Arith operations can insist all domains it supports already support Ord.
  • ...

You could ..., show the inheritance bits that are not shown, maybe compare canonical examples in the IM to your M (e.g., square/rectangle, circle/ellipse, etc.),

The 'inheritance bits not shown' are a using a more powerful feature. I'll show it first for inheritance-in-general-and-stuff, then get to "how else to express squares are rectangles"

METHOD ==(t, t) RETURNS BOOL;
METHOD <=(t, t) RETURNS BOOL
  REQUIRES ==(t, t) BOOL;
METHOD >(t, t) RETURNS BOOL
  REQUIRES <=(t, t) BOOL
  DEFAULT x > y = NOT( x <= y);

(I'm trying to mimic Tutorial D style OPERATOR declarations; apologies if it's not quite right.)

  • METHOD introduces an overloadable function/operator.
  • The declaration gives the most general type signature for the method.
  • The signature has type variables (t in the examples) to indicate the type-overloadable parameters.
  • There might be more than one type variable -- then the method needs multiple despatch.
  • There might be type variables in the RETURNS side of the signature -- indicating a polymorphic return type.
  • The REQUIRES says: any type t with an overloading for <=(t, t) BOOL must already have an overloading for==(t, t) BOOL .
  • (So this is how to do inheritance for squares/rectangles, but it's a more general feature.)
  • Typically you'd expect a bunch of functions/operators to 'belong together', for example testing for less-equal together with testing for greater-than. So REQUIRES gives that bunching.
  • DEFAULT says: in the absence of explicit overloading for some type, use this default overloading/implementation (which in the case of > will go looking for an overloading for <=). If there is an explicit overloading, that overrides the default (useful for giving more efficient implementations for some types).
OVERLOAD ==(x INT, y INT) RETURNS BOOL = ... x ... y ...; // low-level code giving the implementation for equality-testing INT

In a multi-module/library context, typically the OVERLOAD declaration would appear with the type decl. Of course the programmer can give user-defined overloadings  for == alongside their user-defined types. (The RETURNS ... can be omitted in case it's inferable from the METHOD decl; the return type must be consistent with that decl -- i.e. consistently substitute INT for t throughout the signature. In general I'm going to omit return types here on.)

Therefore these methods live in an open world (almost totally unlike the IM): we can freely add implementations/overloadings for yet-to-be-declared types; we can freely extend the bunchings/inheritance for yet-to-be-declared methods.

OK. Everybody happy so far? Where I'm going now is that 'rectangularity'/'squareness' from earlier in this thread amount to declaring a bunch of methods/operators/functions/overloadings.

From TIRT, page 5

Any operation that can be applied to values of type RECTANGLE can be applied to values of type SQUARE as well (because squares are rectangles).
For example, suppose we have an operator—actually a function—called AREA_OF that returns the area of a given rectangle. Then we can certainly invoke the AREA_OF operator with an argument of type SQUARE, ...

OK I've just shown how to declare a method (say AREA_OF) that can be applied (overloaded) for values of type RECTANGLESQUARECIRCLE/ELLIPSE, ... without requiring a common PossRep. And I've shown a mechanism (REQUIRES) by which we can express -- not the same as TIRT, but looking through the telescope from the other end -- any type declared to exhibit 'squareness' must be already be declared to exhibit 'rectangularity', in which "declared to exhibit <property>" amounts to: has an overloading declared for operation(s) characteristic of that property.

METHOD SIDE_OF(t) RETURNS RAT  REQUIRES AREA_OF(t), LENGTH_OF(t), WIDTH_OF(t)

                               DEFAULT SIDE_OF(x) = LENGTH_OF(x) ;

SIDE_OF gives the side of a square; anything with a side (i.e. anything exhibiting squareness) must exhibit rectangularity, that is by having those REQUIRES methods declared/overloaded (and we can rely on those methods being available for overloadings of SIDE_OF).

 

Third blunder: each abstraction has exactly one representation in the database/data structure

We might have a variety of entities represented in the database. For definiteness, say we're representing office furniture for a floorplan. Those entities (desks, tables, cupboards, ...) exhibit a 'footprint' which might be rectangular or square. That doesn't mean all the representations have a component shape of type RECTANGLE of which a subset we want to identify as SQUARE. Data modelling just isn't as straitjacketed as that. There might be several representations from which we can abstract rectangularity or squareness.

What we do want to say is: any data structure that represents rectangularity (of footprint) must have such-and-such characteristics; any data structure that represents squareness (of footprint) must have those characteristics plus some. In which "have ... characteristics" we can implement as: supports an API (methods) for obtaining length, width, ....

Consider 'bunching' operators: I'd expect that as well as AREA_OF we might want CENTRE_OF (or more generally CENTROID_OF). So we'd need PossRep(s) that represent 'positioned' rectangles/squares. But maybe not all figures represented in our database are 'positioned', even though they do have an area.

I can choose a compact representation for 'positioned' figures: two points being the bottom-left and top-right corners (assuming all are aligned to the x-y axes). For the IM I can give an alternative PossRep with length, width, centre, alignment. What I can't do for the IM is bring non-positioned figures into the type hierarchy, because centre, alignment don't apply. Neither can I operate over the two-points representation as if it's a plane figure with auxiliary attributes á la Specialisation-by-Extension.

In the office furniture toy application: we have desks, tables, ... stored in the depot/not positioned; we have furniture in situ in an office; we have furniture notionally on a floorplan we're designing. All have FLOOR_AREA; not all have CENTRE_OF or positioned edges etc.

METHOD CENTRE_OF(t) RETURNS POINT  REQUIRES FLOOR_AREA(t) ;

How does the IM represent these figures such that FLOOR_AREA is overloaded for all representations, but CENTRE_OF only for some?

Quote from AntC on October 27, 2019, 7:11 am

Sure. Because in our analysis of the 'business domain' for this database/schema, it's useful to treat squares as a form of rectangle. Please now show the declarations for treating ellipses/circles together with rectangles/squares as forms of figures of the plane.

If that's what you want, the bounding box plus a rounded/squared-off indicator will serve.

  • For rectangles/squares, look at these definitions for portrait (adj) vs landscape (adj). Rectangles have those properties; squares do not.
  • Ellipses have major and minor semi-axes; circles do not.

I think that's like saying: "The real sqrt function returns two values, but zero has only one square root, ergo real sqrtdoes not apply to it."  That's a conceivable treatment, but it's normal to say that zero has two roots that happen to be the same.  By the same token, the two orientations of a square and the semi-axes of a circle are also identical.

So I want to say having "all the operators" is not an intrinsic property, it's an emergent property from our choice of representation. The representation is not forced to have a type RECTANGLE with components length, width. Some representation exhibits rectangularity just in case we can declare/implement methods length, width for its type.

Duck typing, in short.  Which is why TTM says nothing about representations: it only requires you to give a possible representation which the compiler is free to completely ignore if it can prove that its representation has the same representational power (no easy matter, of course, but not impossible either).

No I'm thinking of a non-redundant representation that is not prone to update anomalies. If I know some entity in my database has the property it is necessarily square, I should represent its side as a single component of the data structure.

There are no actual update anomalies with a rectangular representation of squares, because assignments to pseudovariables always assign a distinct value to the underlying variable.  There is no requirement that any possrep is an orthogonal basis; if you have a user-defined type DATE and the assignment D := DATE(2019, 10, 30), then THE_MONTH(D) := 2 ought to provoke an exception.  By the same token, if S is of type SQUARE, then THE_LENGTH(S) := 10 will mean that the value of THE_WIDTH(S) will also be 10 henceforth.  That is no anomaly, but a plain geometrical fact.

Quote from johnwcowan on November 3, 2019, 7:03 pm

There are no actual update anomalies with a rectangular representation of squares, because assignments to pseudovariables always assign a distinct value to the underlying variable.  There is no requirement that any possrep is an orthogonal basis; if you have a user-defined type DATE and the assignment D := DATE(2019, 10, 30), then THE_MONTH(D) := 2 ought to provoke an exception.  By the same token, if S is of type SQUARE, then THE_LENGTH(S) := 10 will mean that the value of THE_WIDTH(S) will also be 10 henceforth.  That is no anomaly, but a plain geometrical fact.

You are supposing that because that's the solution you might intuitively choose and you think the entire world thinks just the same.  Stop.

Your approach introduces updates that expose side-effects and TTM does not want that.  That's why (if my memory still serves me well) the TTM IM option is to let the compile-time type exception happen (of assigning a value from an expression of declared type RECTANGLE (*) to a SQUARE variable).  Meaning the pseudovariables of the proper supertypes are essentially simply unavailable.  Meaning you can only write THE_SIDE(S) := 10 (with the appropriate effect on LENGTH and WIDTH) but not THE_LENGTH(S) := 10.

(*) that expression being RECTANGLE(10, THE_WIDTH(S)).  If the programmer wrote it like that, he'd have to add a cast for it to compile, and now he's exposed to runtime type exceptions.  But the pseudovariable assignment syntax does not lend itself easily to including a cast.

Quote from Erwin on November 4, 2019, 11:50 am

Your approach introduces updates that expose side-effects and TTM does not want that.

As I said, there are no updates and no side effects.  You may not like the idea of implementing squares with separate LENGTH and WIDTH fields, but there's nothing inconsistent about it.  If you use a SQUARE(10,12) selector, you get a run-time exception similar to the exception you get from 1/0.  If you assign SQUARE(10,10) to a variable V, then THE_LENGTH(V) := 15 assigns a new value (not an updated value; there is no such thing in TTM, only updated variables) that is = to the selector SQUARE(15, 15).  All this is exactly what logic demands.

Now it's easy to see that SQUARE implemented like this is a subtype of RECTANGLE implemented in the same way.  If RECTANGLEin TTM had mutable instances as java.awt.Rectangle does, it would be another story due to contravariance.  "If it was so, it might be; and if it were so, it would be; but as it isn't, it ain't. That's logic."

That's why (if my memory still serves me well) the TTM IM option is to let the compile-time type exception happen (of assigning a value from an expression of declared type RECTANGLE (*) to a SQUARE variable).

Certainly.  A variable of declared type SQUARE can only hold a square or a subtype; it can't possibly hold a supertype.

Meaning the pseudovariables of the proper supertypes are essentially simply unavailable.  Meaning you can only write THE_SIDE(S) := 10 (with the appropriate effect on LENGTH and WIDTH) but not THE_LENGTH(S) := 10.

That's true if you decide to specify a possrep for SQUARE whose only component is SIDE.  My point is that that is a choice and there are other choices.

 

12