## The Forum for Discussion about The Third Manifesto and Related Matters

Forum breadcrumbs - You are here:
Please or Register to create posts and topics.

# Type checking/inference, practical application?

PreviousPage 4 of 4
Quote from dandl on July 9, 2021, 12:57 am
Quote from AntC on July 8, 2021, 12:37 pm
Quote from dandl on July 8, 2021, 8:51 am

... Note that although AA claims to reduce the count from 6 to 5, it depends on relcons to do so, and both this and the passing mention of aggregation fall well short of formal specification.

Appendix A has a specification of `SUMMARIZE` and `GROUP` in terms of the operators previously defined. What's not formal enough about it?

The specification goes as far as <summary spec> on page a.18 but no further.  Functions such as min/max/sum and generic aggregation as per TTM RM Pre 6 are not specified here, but they are in mine.

As well as the algebra I would include `DUM` and relational assignment in any language based on it.

If Tropas[h]ko has a full set of definitions I would be interested to see them.

• Tropashko `^` aka lattice meet aka `NatJoin` specified as per Appendix A `<AND>`.

(So this caters for Codd's `Intersection`, `TIMES` as special cases -- same approach as Appendix A.)

• Tropashko `v` aka lattice join aka `InnerUnion` specified as follows (using the style of FORMAL DEFINITIONS in Appendix A):
• Let `s` be `r1 InnerUnion r2`. It is required that if `<A, T1> ∈ Hr1` and `<A, T2> ∈ Hr2`, then `T1 = T2`.

Hs = Hr1 intersection Hr2

Bs = {ts : (exists tr1 and tr1 ∈ Br1 and ts ⊆ tr1) or (exists tr2 and tr2 ∈ Br2 and ts ⊆ tr2) }

The condition on headings `Hr1, Hr2` is the same as Appendix A's condition on headings for `<AND>, <OR>` aka 'join-compatible'.
The `⊆`  condition on the tuples in the bodies is taking advantage of Appendix A/TTM defining tuples as sets. This makes the definition domain-independent, like Codd's `UNION`, unlike `<OR>`.

Tutorial D's `UNION` equivalent to Codd's `UNION` is `InnerUnion` specialized for operands of the same heading -- same as how Appendix A handles `UNION` using `<OR>`.

`InnerUnion` is equivalent to SQL `UNION CORRESPONDING` or ISBL `UNION`.

`InnerUnion` can be defined equationally in terms of `NatJoin`, so is (arguably/controversially) not 'basic' but derived.

• `DUM` aka `R00` is defined as per Appendix A/TTM: the relation with empty heading, empty body: `REL{} {}`.
• For projection, note that the Tropashko approach uses only relation values, there's no separate mathematical 'sort' of attribute names or attribute-comma-list. So
• `r1 project r2 =df r1 InnerUnion (r2 NatJoin R00)` -- so a derived operator.
• Note that `r2` is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in `r1` -- or indeed its heading might be disjoint from `r1`, in which case the `project` = `r1 InnerUnion R00`.
• For  semijoin
• `r1 MATCHING r2 =df (r1 NatJoin r2) InnerUnion (r1 NatJoin R00)` -- so a derived operator.
• For anti(semi)join, `NOTMATCHING` behaves as per Appendix A. It is the solution to the simultaneous equations  --  so  a derived operator.
• `(r1 NOTMATCHING r2) InnerUnion (r1 NatJoin r2) = r1`
• `(r1 NOTMATCHING r2) NatJoin (r1 NatJoin r2) = r1 NatJoin r2 NatJoin R00`
• Also: `r1 MATCHING r2 = r1 NOTMATCHING (r1 NOTMATCHING r2)` -- generalised from McGoveran & Date 1994
• For restriction/`WHERE`, use relcons with `NatJoin` -- same approach as Appendix A.
• For `EXTEND`, use relcons with `NatJoin` -- same approach as Appendix A.
• For `RENAME`, dispense with it using relcons with `COMPOSE` -- same approach as Appendix A. And I am dubious about that approach; but it's no worse than Appendix A.
• `r1 COMPOSE r2 =df (r1 NatJoin r2) REMOVE (r1 InnerUnion r2)` -- so a derived operator; not a "macro" operator -- whatever Appendix A's scare quotes are trying to convey.
• `r1 REMOVE r2` is `r1` projected on the attributes not in common with `r2`. Note that `r2` is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in `r1` -- or indeed its heading might be disjoint from `r1`, in which case the `REMOVE` is a no-op. Defined equationally, compare the definition for `NOTMATCHING` -- so a derived operator:
• `r1 NatJoin (r1 REMOVE r2) = r1`
• `r1 InnerUnion (r1 REMOVE r2) = r1 REMOVE r2`
• `(r1 REMOVE r2) InnerUnion r2 = r1 InnerUnion R00`
• `r1 project r2 = r1 REMOVE (r1 REMOVE r2)`
• (We believe these equations sufficient to define `REMOVE` uniquely -- just as soon as we can equationally define `R00` uniquely.)
• Or if you've been following along the thread on complements and pseudocomplements, `r1 REMOVE r2 = r1 InnerUnion hComp(r2)`, in which `hComp( )` is the (unique) relative complement of `r2 NatJoin R00` within the convex sublattice bounded by the interval `[R10, R00]`. (Where `R10` is lattice bottom aka `Dumpty`.)
• For `<NOT>`, Clayden's Tropashko system does not include it -- preferring `NOTMATCHING` instead. (Note that `NOTMATCHING` is a generalisation of Codd 1972's `MINUS` and Appendix A requires `<NOT>` to appear only within a conjunction in the surface expression, which amounts to `NOTMATCHING`.)

If you've been following along the thread on complements and pseudocomplements, `<NOT>` is the (unique) relative complement within the convex sublattice bounded by the interval `[r1 NatJoin R00, r1 InnerUnion R11]`. (Where `R11` aka fullest-possible-relation is the (unique) absolute pseudocomplement of `R00`.)

So we've arrived at equivalent expressive power to the A algebra (but with `InnerUnion` instead of `<OR>`). If I've left out any operators, they can be derived following Appendix A's equivalences.

I agree. That's taken quite a bit of effort: thank you.

• So the final body count is: NatJoin, InnerUnion, NOTMATCHING, relcons (plus TCLOSE)?
• DUM is a relcon, why define it separately?

Because a) it's needed to define the operators, b) it's domain-independent. Whereas (say) `PLUS` needs to be specific to number types.

• as far as I was aware, AA does not dispense with RENAME (see p a.7)

Page 10 dispenses with RENAME.

• AA does discuss further reductions (see pa.8) which would place in on much the same footing.

This seems to be on a par with AA but does not formally treat relcons, aggregation or transitive closure. So the expressive power is on a par with AA (except for TCLOSE), and lower than the ERA I have presented.

The issue here is to refute your claim about the (basic/Extended) RA. There is no agreement as to whether aggregation or TCLOSE are even desirable in a DML. (Codd thought they weren't.) They're included in Appendix A because it's giving the semantics underpinning Tutorial D, which is aimed at being a general-purpose programming environment, not specifically an algebra/DML. You seem to be trying to turn this into some sort of competition. I'm not interested in playing those games.

An algebra with 20 or a hundred operators would be just as valid -- provided it met the criterion of expressive completeness. If there's anything to choose between algebras, it's on grounds of ergonomics or elegance or some such taste-based/non-disputable basis.

But overall the similarities are more striking than the differences. Are these different algebras, or just variations on a theme?

How many ways round do we have to repeat this to you? The point is expressive completeness. The A algebra has the same expressivity as Codd+RENAME, same as Tropashko's. You could take a different set of operators as basic, providing you could define the remaining operators in terms of them. Then what does "different algebras" mean? "just" and "variations" is a matter of opinion.

Quote from dandl on July 8, 2021, 8:51 am

... [Datalog lives on another planet.]

As well as the algebra I would include ... relational assignment in any language based on it.

Comments like that lead me to suspect you are labouring under a misapprehension that an algebra is some variety of programming language. Did you study algebra in school? Was there anything like assignment? To be meaningful as assignment, there would have to be behind that a notion of flow-of-control and variables that hold different values at different times. Did school algebra have any of that?

When we write in an algebra `x = y + z`, we do not mean 'evaluate `y`; evaluate `z`; add the values; put the result in `x`'. That `=` is not `:=`; it's more like `==`: it's making an assertion about what the values `x, y, z` range over/could be a definition for `x`. Similarly when Appendix A FORMAL DEFINITIONS uses setbuilder notation, that's not an instruction how to construct some result, it's expressing a constraint on any candidate tuple `ts` that complies with the heading `Hs`.

The A algebra was not intended to be executable/implementable. To D&D's credit, Appendix A reads like it's describing something implementable.

Similarly for you to talk about "treat[ing] relcons" suggests you misapprehend. Relational values exist, according to the definitions in TTM. That's all an algebra needs to know. It's like you're carrying the ball over the goal line and grounding it, expecting to be awarded 5 points. Whereas everybody else is playing Soccer; carrying the ball is not part of the game.

Datalog is very much on the same planet; indeed in the same ballpark -- not, evidently, the ballpark you're in. In fact Datalog counts as another possible algebra. Explanation:

• "The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, " [wikipedia].
• Datalog expresses first-order predicate logic. (Well, subject to some limitations.) It's declarative, not procedural, not a Functional Programming language. More specifically -- and this explains how Codd ended up with his set of operators:
• The body of a relation represents the extension of some 'characteristic predicate'.
• The operators of an algebra must at least express all possible logical connectives between those characteristic predicates of relations.
• `UNION` of the bodies of relations corresponds to `OR` of the predicates.
• `INTERSECTION` of the bodies of relations corresponds to `AND` of the predicates.
• `MINUS` of the bodies of relations corresponds to `AND NOT ...` of the predicates.
• projection -- or rather the equivalent `REMOVE` over the bodies of relations corresponds to `existential quantification` of the removed attribute over the predicate.
• Datalog doesn't implement RENAME, because it uses a positional perspective.
• There's an implicit `universal quantification` of all free attributes of an expression.
• restriction corresponds to applying (`AND`ing) domain-specific filters/comparisons to a predicate -- that is, Codd's `theta`s.

Ad hominem remarks noted and deleted.

I agree. That's taken quite a bit of effort: thank you.

• So the final body count is: NatJoin, InnerUnion, NOTMATCHING, relcons (plus TCLOSE)?
• DUM is a relcon, why define it separately?

Because a) it's needed to define the operators, b) it's domain-independent. Whereas (say) `PLUS` needs to be specific to number types.

DUM is not part of the algebra, it's a relcon. Using my formal specification of relcons, it corresponds to a null function. It would seem my proposal is the more general.

• as far as I was aware, AA does not dispense with RENAME (see p a.7)

Page 10 dispenses with RENAME.

AA contains a lot of discussion about what might or could be done, but the formal specification includes RENAME on p a.13.

The point is expressive completeness. The A algebra has the same expressivity as Codd+RENAME, same as Tropashko's. You could take a different set of operators as basic, providing you could define the remaining operators in terms of them. Then what does "different algebras" mean? "just" and "variations" is a matter of opinion.

On the importance of expressiveness as a metric we agree. There are queries that can be expressed:

• in Tropashko but not in Codd+ (selection and extend via relcons)
• in AA but not in Tropashko  (TCLOSE, canned aggregation(*))
• in my Andl ERA but not in AA (ETCLOSE, generalised aggregation(*), GTC)
• in SQL but not in my Andl ERA (outer joins, ordered and recursive queries).

(*) TD includes aggregation based on 'canned' functions, but not generalised aggregation (so it does not comply with TTM Pre 6). AA purports to include aggregation as per TD. While I do not find the arguments convincing, I think we can agree AA does not define generalised aggregation. My ERA does.

As well, my ERA proposal defines ETCLOSE and GTC based on that. My ERA is the most expressive of any that I know, but still cannot express every query in SQL. That would be my aim.

Andl - A New Database Language - andl.org
To disagreem is not to show ignorance -- another fallacy.

Similarly for you to talk about "treat[ing] relcons" suggests you misapprehend. Relational values exist, according to the definitions in TTM.

So let me be crystal clear. My proposal is in 3 separate parts.

1. My Andl ERA is an algebra, with formal specifications based on conventional maths (set-builder notation).
2. I have included a formal specification for relational constants, based on functions, using the same kind of maths.
3. I have suggested it be accompanied by a formal specification of relational assignment, for the benefit of language implementors.

Datalog is very much on the same planet; indeed in the same ballpark -- not, evidently, the ballpark you're in. In fact Datalog counts as another possible algebra. Explanation:

Datalog is a declarative logic programming language that syntactically is a subset of Prolog.Wikipedia, obviously. A language is not an algebra. QED.

There is no single, final, definitive version:

• recursion can be fixed-point or unbounded
• negation can come in various flavours
• there are problems including aggregation.

However, a typical modern variant has a very high level of expressive power, higher than any RA listed previously (because it includes recursive queries), most likely at the same level as SQL.

Andl - A New Database Language - andl.org
PreviousPage 4 of 4