Type checking/inference, practical application?

#31 · July 9, 2021, 3:09 am

Quote from AntC on July 9, 2021, 3:09 am

Quote from dandl on July 9, 2021, 12:57 am

Quote from AntC on July 8, 2021, 12:37 pm

Quote from dandl on July 8, 2021, 8:51 am

... Note that although AA claims to reduce the count from 6 to 5, it depends on relcons to do so, and both this and the passing mention of aggregation fall well short of formal specification.

Appendix A has a specification of SUMMARIZE and GROUP in terms of the operators previously defined. What's not formal enough about it?

The specification goes as far as <summary spec> on page a.18 but no further. Functions such as min/max/sum and generic aggregation as per TTM RM Pre 6 are not specified here, but they are in mine.

As well as the algebra I would include DUM and relational assignment in any language based on it.

If Tropas[h]ko has a full set of definitions I would be interested to see them.

Tropashko ^ aka lattice meet aka NatJoin specified as per Appendix A <AND>.

(So this caters for Codd's Intersection, TIMES as special cases -- same approach as Appendix A.)

Tropashko v aka lattice join aka InnerUnion specified as follows (using the style of FORMAL DEFINITIONS in Appendix A):

Let s be r1 InnerUnion r2. It is required that if <A, T1> ∈ Hr1 and <A, T2> ∈ Hr2, then T1 = T2.

Hs = Hr1 intersection Hr2

Bs = {ts : (exists tr1 and tr1 ∈ Br1 and ts ⊆ tr1) or (exists tr2 and tr2 ∈ Br2 and ts ⊆ tr2) }

The condition on headings Hr1, Hr2 is the same as Appendix A's condition on headings for <AND>, <OR> aka 'join-compatible'.
The ⊆ condition on the tuples in the bodies is taking advantage of Appendix A/TTM defining tuples as sets. This makes the definition domain-independent, like Codd's UNION, unlike <OR>.

Tutorial D's UNION equivalent to Codd's UNION is InnerUnion specialized for operands of the same heading -- same as how Appendix A handles UNION using <OR>.

InnerUnion is equivalent to SQL UNION CORRESPONDING or ISBL UNION.

InnerUnion can be defined equationally in terms of NatJoin, so is (arguably/controversially) not 'basic' but derived.

DUM aka R00 is defined as per Appendix A/TTM: the relation with empty heading, empty body: REL{} {}.

For projection, note that the Tropashko approach uses only relation values, there's no separate mathematical 'sort' of attribute names or attribute-comma-list. So

r1 project r2 =df r1 InnerUnion (r2 NatJoin R00) -- so a derived operator.

Note that r2 is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in r1 -- or indeed its heading might be disjoint from r1, in which case the project = r1 InnerUnion R00.

For semijoin

r1 MATCHING r2 =df (r1 NatJoin r2) InnerUnion (r1 NatJoin R00) -- so a derived operator.

For anti(semi)join, NOTMATCHING behaves as per Appendix A. It is the solution to the simultaneous equations -- so a derived operator.

(r1 NOTMATCHING r2) InnerUnion (r1 NatJoin r2) = r1

(r1 NOTMATCHING r2) NatJoin (r1 NatJoin r2) = r1 NatJoin r2 NatJoin R00

Also: r1 MATCHING r2 = r1 NOTMATCHING (r1 NOTMATCHING r2) -- generalised from McGoveran & Date 1994

For restriction/WHERE, use relcons with NatJoin -- same approach as Appendix A.

For EXTEND, use relcons with NatJoin -- same approach as Appendix A.

For RENAME, dispense with it using relcons with COMPOSE -- same approach as Appendix A. And I am dubious about that approach; but it's no worse than Appendix A.

r1 COMPOSE r2 =df (r1 NatJoin r2) REMOVE (r1 InnerUnion r2) -- so a derived operator; not a "macro" operator -- whatever Appendix A's scare quotes are trying to convey.

r1 REMOVE r2 is r1 projected on the attributes not in common with r2. Note that r2 is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in r1 -- or indeed its heading might be disjoint from r1, in which case the REMOVE is a no-op. Defined equationally, compare the definition for NOTMATCHING -- so a derived operator:

r1 NatJoin (r1 REMOVE r2) = r1

r1 InnerUnion (r1 REMOVE r2) = r1 REMOVE r2

(r1 REMOVE r2) InnerUnion r2 = r1 InnerUnion R00

r1 project r2 = r1 REMOVE (r1 REMOVE r2)

(We believe these equations sufficient to define REMOVE uniquely -- just as soon as we can equationally define R00 uniquely.)

Or if you've been following along the thread on complements and pseudocomplements, r1 REMOVE r2 = r1 InnerUnion hComp(r2), in which hComp( ) is the (unique) relative complement of r2 NatJoin R00 within the convex sublattice bounded by the interval [R10, R00]. (Where R10 is lattice bottom aka Dumpty.)

For <NOT>, Clayden's Tropashko system does not include it -- preferring NOTMATCHING instead. (Note that NOTMATCHING is a generalisation of Codd 1972's MINUS and Appendix A requires <NOT> to appear only within a conjunction in the surface expression, which amounts to NOTMATCHING.)

If you've been following along the thread on complements and pseudocomplements, <NOT> is the (unique) relative complement within the convex sublattice bounded by the interval [r1 NatJoin R00, r1 InnerUnion R11]. (Where R11 aka fullest-possible-relation is the (unique) absolute pseudocomplement of R00.)

So we've arrived at equivalent expressive power to the A algebra (but with InnerUnion instead of <OR>). If I've left out any operators, they can be derived following Appendix A's equivalences.

I agree. That's taken quite a bit of effort: thank you.

So the final body count is: NatJoin, InnerUnion, NOTMATCHING, relcons (plus TCLOSE)?

DUM is a relcon, why define it separately?

Because a) it's needed to define the operators, b) it's domain-independent. Whereas (say) PLUS needs to be specific to number types.

as far as I was aware, AA does not dispense with RENAME (see p a.7)

Page 10 dispenses with RENAME.

AA does discuss further reductions (see pa.8) which would place in on much the same footing.

This seems to be on a par with AA but does not formally treat relcons, aggregation or transitive closure. So the expressive power is on a par with AA (except for TCLOSE), and lower than the ERA I have presented.

The issue here is to refute your claim about the (basic/Extended) RA. There is no agreement as to whether aggregation or TCLOSE are even desirable in a DML. (Codd thought they weren't.) They're included in Appendix A because it's giving the semantics underpinning Tutorial D, which is aimed at being a general-purpose programming environment, not specifically an algebra/DML. You seem to be trying to turn this into some sort of competition. I'm not interested in playing those games.

An algebra with 20 or a hundred operators would be just as valid -- provided it met the criterion of expressive completeness. If there's anything to choose between algebras, it's on grounds of ergonomics or elegance or some such taste-based/non-disputable basis.

But overall the similarities are more striking than the differences. Are these different algebras, or just variations on a theme?

How many ways round do we have to repeat this to you? The point is expressive completeness. The A algebra has the same expressivity as Codd+RENAME, same as Tropashko's. You could take a different set of operators as basic, providing you could define the remaining operators in terms of them. Then what does "different algebras" mean? "just" and "variations" is a matter of opinion.

Quote from dandl on July 9, 2021, 12:57 am

Quote from AntC on July 8, 2021, 12:37 pm

Quote from dandl on July 8, 2021, 8:51 am

... Note that although AA claims to reduce the count from 6 to 5, it depends on relcons to do so, and both this and the passing mention of aggregation fall well short of formal specification.

Appendix A has a specification of SUMMARIZE and GROUP in terms of the operators previously defined. What's not formal enough about it?

The specification goes as far as <summary spec> on page a.18 but no further. Functions such as min/max/sum and generic aggregation as per TTM RM Pre 6 are not specified here, but they are in mine.

As well as the algebra I would include DUM and relational assignment in any language based on it.

If Tropas[h]ko has a full set of definitions I would be interested to see them.

Tropashko ^ aka lattice meet aka NatJoin specified as per Appendix A <AND>.

(So this caters for Codd's Intersection, TIMES as special cases -- same approach as Appendix A.)

Tropashko v aka lattice join aka InnerUnion specified as follows (using the style of FORMAL DEFINITIONS in Appendix A):

Let s be r1 InnerUnion r2. It is required that if <A, T1> ∈ Hr1 and <A, T2> ∈ Hr2, then T1 = T2.

Hs = Hr1 intersection Hr2

Bs = {ts : (exists tr1 and tr1 ∈ Br1 and ts ⊆ tr1) or (exists tr2 and tr2 ∈ Br2 and ts ⊆ tr2) }

The condition on headings Hr1, Hr2 is the same as Appendix A's condition on headings for <AND>, <OR> aka 'join-compatible'.
The ⊆ condition on the tuples in the bodies is taking advantage of Appendix A/TTM defining tuples as sets. This makes the definition domain-independent, like Codd's UNION, unlike <OR>.

Tutorial D's UNION equivalent to Codd's UNION is InnerUnion specialized for operands of the same heading -- same as how Appendix A handles UNION using <OR>.

InnerUnion is equivalent to SQL UNION CORRESPONDING or ISBL UNION.

InnerUnion can be defined equationally in terms of NatJoin, so is (arguably/controversially) not 'basic' but derived.

DUM aka R00 is defined as per Appendix A/TTM: the relation with empty heading, empty body: REL{} {}.

For projection, note that the Tropashko approach uses only relation values, there's no separate mathematical 'sort' of attribute names or attribute-comma-list. So

r1 project r2 =df r1 InnerUnion (r2 NatJoin R00) -- so a derived operator.

Note that r2 is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in r1 -- or indeed its heading might be disjoint from r1, in which case the project = r1 InnerUnion R00.

For semijoin

r1 MATCHING r2 =df (r1 NatJoin r2) InnerUnion (r1 NatJoin R00) -- so a derived operator.

For anti(semi)join, NOTMATCHING behaves as per Appendix A. It is the solution to the simultaneous equations -- so a derived operator.

(r1 NOTMATCHING r2) InnerUnion (r1 NatJoin r2) = r1

(r1 NOTMATCHING r2) NatJoin (r1 NatJoin r2) = r1 NatJoin r2 NatJoin R00

Also: r1 MATCHING r2 = r1 NOTMATCHING (r1 NOTMATCHING r2) -- generalised from McGoveran & Date 1994

For restriction/WHERE, use relcons with NatJoin -- same approach as Appendix A.

For EXTEND, use relcons with NatJoin -- same approach as Appendix A.

For RENAME, dispense with it using relcons with COMPOSE -- same approach as Appendix A. And I am dubious about that approach; but it's no worse than Appendix A.

r1 COMPOSE r2 =df (r1 NatJoin r2) REMOVE (r1 InnerUnion r2) -- so a derived operator; not a "macro" operator -- whatever Appendix A's scare quotes are trying to convey.

r1 REMOVE r2 is r1 projected on the attributes not in common with r2. Note that r2 is unrestricted apart from needing to be join-compatible so might have attributes that don't appear in r1 -- or indeed its heading might be disjoint from r1, in which case the REMOVE is a no-op. Defined equationally, compare the definition for NOTMATCHING -- so a derived operator:

r1 NatJoin (r1 REMOVE r2) = r1

r1 InnerUnion (r1 REMOVE r2) = r1 REMOVE r2

(r1 REMOVE r2) InnerUnion r2 = r1 InnerUnion R00

r1 project r2 = r1 REMOVE (r1 REMOVE r2)

(We believe these equations sufficient to define REMOVE uniquely -- just as soon as we can equationally define R00 uniquely.)

Or if you've been following along the thread on complements and pseudocomplements, r1 REMOVE r2 = r1 InnerUnion hComp(r2), in which hComp( ) is the (unique) relative complement of r2 NatJoin R00 within the convex sublattice bounded by the interval [R10, R00]. (Where R10 is lattice bottom aka Dumpty.)

For <NOT>, Clayden's Tropashko system does not include it -- preferring NOTMATCHING instead. (Note that NOTMATCHING is a generalisation of Codd 1972's MINUS and Appendix A requires <NOT> to appear only within a conjunction in the surface expression, which amounts to NOTMATCHING.)

If you've been following along the thread on complements and pseudocomplements, <NOT> is the (unique) relative complement within the convex sublattice bounded by the interval [r1 NatJoin R00, r1 InnerUnion R11]. (Where R11 aka fullest-possible-relation is the (unique) absolute pseudocomplement of R00.)

So we've arrived at equivalent expressive power to the A algebra (but with InnerUnion instead of <OR>). If I've left out any operators, they can be derived following Appendix A's equivalences.

I agree. That's taken quite a bit of effort: thank you.

So the final body count is: NatJoin, InnerUnion, NOTMATCHING, relcons (plus TCLOSE)?

DUM is a relcon, why define it separately?

Because a) it's needed to define the operators, b) it's domain-independent. Whereas (say) PLUS needs to be specific to number types.

as far as I was aware, AA does not dispense with RENAME (see p a.7)

Page 10 dispenses with RENAME.

AA does discuss further reductions (see pa.8) which would place in on much the same footing.

This seems to be on a par with AA but does not formally treat relcons, aggregation or transitive closure. So the expressive power is on a par with AA (except for TCLOSE), and lower than the ERA I have presented.

The issue here is to refute your claim about the (basic/Extended) RA. There is no agreement as to whether aggregation or TCLOSE are even desirable in a DML. (Codd thought they weren't.) They're included in Appendix A because it's giving the semantics underpinning Tutorial D, which is aimed at being a general-purpose programming environment, not specifically an algebra/DML. You seem to be trying to turn this into some sort of competition. I'm not interested in playing those games.

An algebra with 20 or a hundred operators would be just as valid -- provided it met the criterion of expressive completeness. If there's anything to choose between algebras, it's on grounds of ergonomics or elegance or some such taste-based/non-disputable basis.

But overall the similarities are more striking than the differences. Are these different algebras, or just variations on a theme?

How many ways round do we have to repeat this to you? The point is expressive completeness. The A algebra has the same expressivity as Codd+RENAME, same as Tropashko's. You could take a different set of operators as basic, providing you could define the remaining operators in terms of them. Then what does "different algebras" mean? "just" and "variations" is a matter of opinion.

#32 · July 9, 2021, 6:18 am

Quote from AntC on July 9, 2021, 6:18 am

Quote from dandl on July 8, 2021, 8:51 am

... [Datalog lives on another planet.]

As well as the algebra I would include ... relational assignment in any language based on it.

Comments like that lead me to suspect you are labouring under a misapprehension that an algebra is some variety of programming language. Did you study algebra in school? Was there anything like assignment? To be meaningful as assignment, there would have to be behind that a notion of flow-of-control and variables that hold different values at different times. Did school algebra have any of that?

When we write in an algebra x = y + z, we do not mean 'evaluate y; evaluate z; add the values; put the result in x'. That = is not :=; it's more like ==: it's making an assertion about what the values x, y, z range over/could be a definition for x. Similarly when Appendix A FORMAL DEFINITIONS uses setbuilder notation, that's not an instruction how to construct some result, it's expressing a constraint on any candidate tuple ts that complies with the heading Hs.

The A algebra was not intended to be executable/implementable. To D&D's credit, Appendix A reads like it's describing something implementable.

Similarly for you to talk about "treat[ing] relcons" suggests you misapprehend. Relational values exist, according to the definitions in TTM. That's all an algebra needs to know. It's like you're carrying the ball over the goal line and grounding it, expecting to be awarded 5 points. Whereas everybody else is playing Soccer; carrying the ball is not part of the game.

Datalog is very much on the same planet; indeed in the same ballpark -- not, evidently, the ballpark you're in. In fact Datalog counts as another possible algebra. Explanation:

"The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, " [wikipedia].

Datalog expresses first-order predicate logic. (Well, subject to some limitations.) It's declarative, not procedural, not a Functional Programming language. More specifically -- and this explains how Codd ended up with his set of operators:

The body of a relation represents the extension of some 'characteristic predicate'.

The operators of an algebra must at least express all possible logical connectives between those characteristic predicates of relations.

UNION of the bodies of relations corresponds to OR of the predicates.

INTERSECTION of the bodies of relations corresponds to AND of the predicates.

MINUS of the bodies of relations corresponds to AND NOT ... of the predicates.

projection -- or rather the equivalent REMOVE over the bodies of relations corresponds to existential quantification of the removed attribute over the predicate.

Datalog doesn't implement RENAME, because it uses a positional perspective.

There's an implicit universal quantification of all free attributes of an expression.

restriction corresponds to applying (ANDing) domain-specific filters/comparisons to a predicate -- that is, Codd's thetas.

Quote from dandl on July 8, 2021, 8:51 am

... [Datalog lives on another planet.]

As well as the algebra I would include ... relational assignment in any language based on it.

Comments like that lead me to suspect you are labouring under a misapprehension that an algebra is some variety of programming language. Did you study algebra in school? Was there anything like assignment? To be meaningful as assignment, there would have to be behind that a notion of flow-of-control and variables that hold different values at different times. Did school algebra have any of that?

When we write in an algebra x = y + z, we do not mean 'evaluate y; evaluate z; add the values; put the result in x'. That = is not :=; it's more like ==: it's making an assertion about what the values x, y, z range over/could be a definition for x. Similarly when Appendix A FORMAL DEFINITIONS uses setbuilder notation, that's not an instruction how to construct some result, it's expressing a constraint on any candidate tuple ts that complies with the heading Hs.

The A algebra was not intended to be executable/implementable. To D&D's credit, Appendix A reads like it's describing something implementable.

Similarly for you to talk about "treat[ing] relcons" suggests you misapprehend. Relational values exist, according to the definitions in TTM. That's all an algebra needs to know. It's like you're carrying the ball over the goal line and grounding it, expecting to be awarded 5 points. Whereas everybody else is playing Soccer; carrying the ball is not part of the game.

Datalog is very much on the same planet; indeed in the same ballpark -- not, evidently, the ballpark you're in. In fact Datalog counts as another possible algebra. Explanation:

"The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, " [wikipedia].
Datalog expresses first-order predicate logic. (Well, subject to some limitations.) It's declarative, not procedural, not a Functional Programming language. More specifically -- and this explains how Codd ended up with his set of operators:
The body of a relation represents the extension of some 'characteristic predicate'.
The operators of an algebra must at least express all possible logical connectives between those characteristic predicates of relations.
UNION of the bodies of relations corresponds to OR of the predicates.
INTERSECTION of the bodies of relations corresponds to AND of the predicates.
MINUS of the bodies of relations corresponds to AND NOT ... of the predicates.
projection -- or rather the equivalent REMOVE over the bodies of relations corresponds to existential quantification of the removed attribute over the predicate.
Datalog doesn't implement RENAME, because it uses a positional perspective.
There's an implicit universal quantification of all free attributes of an expression.
restriction corresponds to applying (ANDing) domain-specific filters/comparisons to a predicate -- that is, Codd's thetas.

#33 · July 10, 2021, 1:15 am

Ad hominem remarks noted and deleted.

I agree. That's taken quite a bit of effort: thank you.

So the final body count is: NatJoin, InnerUnion, NOTMATCHING, relcons (plus TCLOSE)?

DUM is a relcon, why define it separately?

Because a) it's needed to define the operators, b) it's domain-independent. Whereas (say) PLUS needs to be specific to number types.

DUM is not part of the algebra, it's a relcon. Using my formal specification of relcons, it corresponds to a null function. It would seem my proposal is the more general.

as far as I was aware, AA does not dispense with RENAME (see p a.7)

Page 10 dispenses with RENAME.

AA contains a lot of discussion about what might or could be done, but the formal specification includes RENAME on p a.13.

The point is expressive completeness. The A algebra has the same expressivity as Codd+RENAME, same as Tropashko's. You could take a different set of operators as basic, providing you could define the remaining operators in terms of them. Then what does "different algebras" mean? "just" and "variations" is a matter of opinion.

On the importance of expressiveness as a metric we agree. There are queries that can be expressed:

in Tropashko but not in Codd+ (selection and extend via relcons)
in AA but not in Tropashko (TCLOSE, canned aggregation(*))
in my Andl ERA but not in AA (ETCLOSE, generalised aggregation(*), GTC)
in SQL but not in my Andl ERA (outer joins, ordered and recursive queries).

(*) TD includes aggregation based on 'canned' functions, but not generalised aggregation (so it does not comply with TTM Pre 6). AA purports to include aggregation as per TD. While I do not find the arguments convincing, I think we can agree AA does not define generalised aggregation. My ERA does.

As well, my ERA proposal defines ETCLOSE and GTC based on that. My ERA is the most expressive of any that I know, but still cannot express every query in SQL. That would be my aim.

Andl - A New Database Language - andl.org

#34 · July 10, 2021, 1:40 am

Ad hominem comments noted and deleted.

To disagreem is not to show ignorance -- another fallacy.

Similarly for you to talk about "treat[ing] relcons" suggests you misapprehend. Relational values exist, according to the definitions in TTM.

So let me be crystal clear. My proposal is in 3 separate parts.

My Andl ERA is an algebra, with formal specifications based on conventional maths (set-builder notation).
I have included a formal specification for relational constants, based on functions, using the same kind of maths.
I have suggested it be accompanied by a formal specification of relational assignment, for the benefit of language implementors.

Datalog is very much on the same planet; indeed in the same ballpark -- not, evidently, the ballpark you're in. In fact Datalog counts as another possible algebra. Explanation:

Datalog is a declarative logic programming language that syntactically is a subset of Prolog.Wikipedia, obviously. A language is not an algebra. QED.

There is no single, final, definitive version:

recursion can be fixed-point or unbounded
negation can come in various flavours
there are problems including aggregation.

However, a typical modern variant has a very high level of expressive power, higher than any RA listed previously (because it includes recursive queries), most likely at the same level as SQL.

Andl - A New Database Language - andl.org

TTM Forum

The Forum for Discussion about The Third Manifesto and Related Matters