The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Formal definitions for Relation Constant, Transitive Closure, Extended Transitive Closure

The following are updated and corrected versions of definitions posted earlier, plus Extended TC. There are formatting issues, for which I apologise, but I find this editor very difficult to drive.

New Value as Relational Constant

This is the formalisation of a relational constant or ‘relcon’, and relies on a previously defined function. Separate formalisations are provided for monadic and dyadic functions. Extending them to n-adic functions is left as an exercise.

Monadic

Given a function f with the type signature Tx->Ty, the relcon S is defined as follows.

Hs = { <X,Tx>, <Y,Ty> }
Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty 
  (ts = {<X,Tx,vx>, <Y,Ty,vy>} and f(Vx) = Vy)}

Dyadic

Given a function f with the type signature Tx->Ty->Tz, the relcon S is defined as follows.

Hs = { <X,Tx>, <Y,Ty>, <Z,Tz>}
Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty exists vz ∈ Tz
  (ts = {<X,Tx,vx>, <Y,Ty,vy, <Z,Tz,vz>} and f(Vx,Vy) = Vz)}

For the relcon PLUS in App-A, types Tx, Ty and Tz are all INTEGER, and the function f is the scalar operator "+".

Note that a ‘relcon’ can be used as one argument to a join. Operations sometimes known as WHERE, EXTEND, UPDATE and DELETE are shorthands that may include such a combination.

Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end alue is all the possible paths through the graph.

Given a set S of tuples with the heading {<X,T>,<Y,T>} for some type T, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
(exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T
exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>})
exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>})
ts’ = {<A,T,v1>, <B,T,v3>} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S

Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.

Note that the operation sometimes known as TCLOSE corresponds to this definition.

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.

Note that the function enables a cost to be calculated for each path. Generalised Transitive Closure as defined by D&D (RM VSS 5) requires an additional step of aggregation.

Andl - A New Database Language - andl.org

Not that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1

Let me rewrite my code as a successor function:

OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })
    RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;
  RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:
    (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);
END OPERATOR

How do I use it to calculate the GTC?

Quote from tobega on February 16, 2021, 2:58 pm

Not that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1

Let me rewrite my code as a successor function:

OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })
RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;
RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:
(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);
END OPERATOR
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOR
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })
    RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;
  RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:
    (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);
END OPERATOR

How do I use it to calculate the GTC?

First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.

The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.

The final step to GTC is by aggregation over the result of the ETCLOSE.

Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing  a series, as shown by my formal treatment. ETCLOSE ditto.

 

Andl - A New Database Language - andl.org
Quote from dandl on February 16, 2021, 11:50 pm
Quote from tobega on February 16, 2021, 2:58 pm

Not that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1

Let me rewrite my code as a successor function:

OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })
RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;
RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:
(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);
END OPERATOR
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOR
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })
    RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;
  RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:
    (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);
END OPERATOR

How do I use it to calculate the GTC?

First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.

The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.

The final step to GTC is by aggregation over the result of the ETCLOSE.

Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing  a series, as shown by my formal treatment. ETCLOSE ditto.

 

This is the series I am interested in.

So when you write

T = S1 U S2 U S3 … S∞

Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.


How do I use it to calculate the GTC?

First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.

The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.

The final step to GTC is by aggregation over the result of the ETCLOSE.

Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing  a series, as shown by my formal treatment. ETCLOSE ditto.

This is the series I am interested in.

So when you write

T = S1 U S2 U S3 … S∞

Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.

My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.

Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce  result.

It is a choice you are free to make: power or safety. Both are valuable.

I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.

 

Andl - A New Database Language - andl.org
Quote from dandl on February 19, 2021, 4:54 am

How do I use it to calculate the GTC?

First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.

The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.

The final step to GTC is by aggregation over the result of the ETCLOSE.

Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing  a series, as shown by my formal treatment. ETCLOSE ditto.

This is the series I am interested in.

So when you write

T = S1 U S2 U S3 … S∞

Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.

My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.

Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce  result.

It is a choice you are free to make: power or safety. Both are valuable.

I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.

 

I'm trying to learn how to do this in practice, so what are the algorithmic steps?

Since you mention that there are special requirements that seem to go beyond the formal definition given, such as I may not use a general programming language, what are those requirements? What does a safe successor function look like for the case of a container having amount of contained, how is it implemented? The function to combine amounts in the step is multiplication.

When I have a safe successor function what are the algorithmic steps of a safe ETCLOSE function? Without knowing all the steps it is impossible to verify that it is safe.

The name and definition for while come from the Alice book, but in practice it works the same as SQL CTE RECURSIVE. It's simple enough:

  1. Start with a seed (relational) value
  2. Evaluate a relational expression, with the seed as an argument
  3. Union the result with the seed
  4. Repeat until no new tuples.

Obviously the implementation will require a GP language (less than 10 lines of code in Andl), but queries just use relational expressions.

Andl - A New Database Language - andl.org

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

In the same way, the outputs from successive applications of successor functions could contain duplications where the different path to the identical tuple is no longer apparent.

Come to think of it, the successor function must always have the current seed joined with the original relation (which represents the number one, or one step), otherwise we will skip levels.

Container Contained quantity
red blue 1
blue green 10
blue yellow 2
yellow green 5

So in the first application you would have "red, green, 10", "red, yellow,2" and "blue,green, 5". These must then be joined with the original for the next successor, giving "red, green, 10" again, so care must be taken that we when we "union" the first and second results end up with 20 greens in a red, not just 10.

Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?