The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

An alternative to open expressions

12
Quote from dandl on July 23, 2020, 2:15 pm
Quote from AntC on July 23, 2020, 11:59 am
Quote from dandl on July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value.

HHT 1975 (and Appendix A) see that as one algorithmic relation (or one relcon) with two uses. Supposing we have to hand a Appendix A-style relcon PLUS with attributes {X, Y, Z} such that X + Y == Z for all tuples. Consider two queries:

  • Given relvar R1 with attributes {X, Y, Z} amongst others: R1 WHERE (X + Y) = Z
  • Given relvar R2 with attributes {X, Y} amongst others, but lacking Z: EXTEND R2 : { Z := X + Y}

I understand the point and I considered it carefully. While there is a superficial attraction in using the same relcon for two different purposes, it only works in practice for a narrow set of cases. I see no way to use it as the basis for a construct such as WHERE STATUS >= 20. If you have an answer for that, I'd be interested.

Same answer. Supposing we have a Appendix A-style relcon GE with attributes {X, Y} such that X >= Y for all tuples.

  • S <AND> RENAME ((GE <AND> REL{TUP{Y 20}}) {ALL BUT Y}) : {STATUS := X}

Is it worth some different machinery for WHERE conditions mentioning only a single attribute? I can't see why: again use <AND> to a single-attribute relcon.

That's exactly what I propose. What do you propose differently?

Note ((GE <AND> REL{TUP{Y 20}}) {ALL BUT Y}) is exactly such a single-attribute relcon.

I am struck by the ghastly thought you are proposing whatever you're proposing as an implementation. No. It will be horribly inefficient and not industrial strength. We already have implementations, which very possibly treat WHERE differently to EXTEND, but equally possibly call the same run-time function (low-level code) down in the guts.

Quote from AntC on July 23, 2020, 9:50 pm
Quote from dandl on July 23, 2020, 2:15 pm
Quote from AntC on July 23, 2020, 11:59 am
Quote from dandl on July 23, 2020, 8:32 am

Extending the previous post, there are actually two kinds of relcon: a predicate, and one that returns a value.

HHT 1975 (and Appendix A) see that as one algorithmic relation (or one relcon) with two uses. Supposing we have to hand a Appendix A-style relcon PLUS with attributes {X, Y, Z} such that X + Y == Z for all tuples. Consider two queries:

  • Given relvar R1 with attributes {X, Y, Z} amongst others: R1 WHERE (X + Y) = Z
  • Given relvar R2 with attributes {X, Y} amongst others, but lacking Z: EXTEND R2 : { Z := X + Y}

I understand the point and I considered it carefully. While there is a superficial attraction in using the same relcon for two different purposes, it only works in practice for a narrow set of cases. I see no way to use it as the basis for a construct such as WHERE STATUS >= 20. If you have an answer for that, I'd be interested.

Same answer. Supposing we have a Appendix A-style relcon GE with attributes {X, Y} such that X >= Y for all tuples.

  • S <AND> RENAME ((GE <AND> REL{TUP{Y 20}}) {ALL BUT Y}) : {STATUS := X}

Mixing up TD syntax and Algebra-A syntax makes it hard to know what you're intending. And since App-A does not provide any formal derivation for its relcons, it's hard to know what GE means here. And that not withstanding, (GE <AND> REL{TUP{Y 20}}) must be a constant which, with a suitable formalism, you could define directly. So what's your point? Why don't you show what you intend using a maths formalism such as set builder, instead of this informal hybrid?

Is it worth some different machinery for WHERE conditions mentioning only a single attribute? I can't see why: again use <AND> to a single-attribute relcon.

That's exactly what I propose. What do you propose differently?

Note ((GE <AND> REL{TUP{Y 20}}) {ALL BUT Y}) is exactly such a single-attribute relcon.

Single attribute but infinite cardinality? Show it in set-builder.

I am struck by the ghastly thought you are proposing whatever you're proposing as an implementation. No. It will be horribly inefficient and not industrial strength. We already have implementations, which very possibly treat WHERE differently to EXTEND, but equally possibly call the same run-time function (low-level code) down in the guts.

Of course not. That would be silly. What I am proposing is an ERA: Extended Relational Algebra, based on Algebra-A but extended from 5 formal definitions to 8, and thereby covering the whole of TD (which A does not), and a bit beyond.

This is the maths right at the heart of the ERA. A language design has to choose a set of shorthands and at that point implementation needs to be considered, but not here. If you really want to know what the low-level code looks like, I'd start with a modern SQL query optimiser, and look at the query plans it emits. Or look at the internal code generated by SQLite. It won't look much like what we're discussing here.

Andl - A New Database Language - andl.org

The ERA I propose consists of formal definitions for 8 operators. 5 of them are as per Algebra A: Rename, Remove, And, Or and Not.

The sixth is Relcon, inspired by the work of D&D in App-A and HHT. There is only one operator, but the formal definition for it depends on the arity of the function, and whether the function is a predicate (returns Boolean) or returns a value.

The seventh is Aggegrate. The formal definition is as follows.

This is the formalisation of simple aggregation, similar to SUM, MAX or MIN but allowing for a wider range of functions. It relies on some previously defined aggregating function f. An aggregating function has an argument which is a list of values (a ‘bag’ or ‘multiset’ rather than a ‘set’) and returns a single value. Extending this approach to more complex aggregations is left as an exercise.
Given a relation r, a type T such that <A,T> ∈ Hr and an aggregating function f with the type signature T[]->Ta then s = AGGREGATE(r,A,f) is defined as follows.

Hs = ( Hr minus {<A,T>} ) union {<A,Ta>}
Bs = { ts : ∃ v ∈ T, ∃ tr ∈ r, ∃ <A,T,v> ∈ tr, v ∈ v[],
   ts = tr minus {<A,T,v>} union {<A,Ta,f(v[])} }

Note: the term v[] should be read as: a list of the values of v for some value of ts.

The Aggregation operator is used in shorthands such as SUMMARIZE. For SUM, the aggregating function is one that takes a list of numeric values as its argument, and returns their sum.

This operator can be used as the basis for any of the shortands defined in TD and many more beside, but aggregations across two or attributes require additional formal definitions.

Please note that this is not compliant with TTM OO Pre 6. I don't see a way to provide a suitably compliant formal definition for the iterated function approach.

Andl - A New Database Language - andl.org

The eighth operator in my proposed Extended Relational Algebra is Transitive Closure. It comes in two variants, TCLOSE and ETCLOSE. Here is the formal definition. Note that ETCLOSE forms the basis for a GTC shorthand, as per RM VSS 5.

In the absence of an infinity symbol that works here please read r<inf> as the final term in an infinite series.

Transitive Closure
As transitive closure cannot be defined directly in set-builder notation, this formalisation defines it indirectly by means of a recurrence relation. This consists of a starting value (given) and a sequence of successors (defined by set-builder). The transitive closure is the fix-point union of that sequence. 

The starting value (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.
a)	Simple Transitive Closure
Given a relation r with the heading {<A,T>,<B,T>} for some type T then the successor relation r’ is defined as follows.

Hr’ = Hr
Br’ = { tr’ : tr’ ∈ Br or
   ∃ v1 ∈ T, ∃ v2 ∈ T, ∃ v3 ∈ T, ∃ v4 ∈ T,
   ∃ tr1 ∈ Br, tr1 = {<A,T,v1>, <B,T,v2>},
   ∃ tr2 ∈ Br, tr2 = {<A,T,v2>, <B,T,v3>}, 
   tr’ = {<A,T,v1>, <B,T,v3>} }

Then the transitive closure s = TCLOSE(r) is defined as follows.

S = r1 U r2 U r3 … r<inf>

This is a linear recurrence, which can be shown to reach a fix-point termination. 

The TCLOSE operator is used in a shorthand of the same name.
b)	Extended Transitive Closure
In this case each tuple is associated with a value, and this definition relies on some previously defined dyadic function f that takes values of that type as its argument and return value. The value computed represents in some sense the cost of that path.

Given a relation r with heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv then the successor relation r’ is defined as follows.

Hr’ = Hr
Br’ = { tr’ : tr’ ∈ Br or
 ( ∃ v1 ∈ T, ∃ v2 ∈ T, ∃ v3 ∈ T, ∃ v4 ∈ T, 
   ∃ w1 ∈ Tv, ∃ w2 ∈ Tv,
   ∃ tr1 ∈ Br, tr1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>},
   ∃ tr2 ∈ Br, tr2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>},
   tr’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} ) }
Then the extended transitive closure s = ETCLOSE(r,f) is defined as follows.

s = r1 U r2 U r3 … r<inf>

This is a linear recurrence, which can be shown to reach a fix-point termination. 

The ETCLOSE operator is used in combination with AGGREGATE in a shorthand to perform Generalised Transitive Closure as defined by D&D (RM VSS 5).

 

Andl - A New Database Language - andl.org
12