Formal definitions for Relation Constant, Transitive Closure, Extended Transitive Closure
Quote from dandl on May 31, 2020, 6:51 amThe following are updated and corrected versions of definitions posted earlier, plus Extended TC. There are formatting issues, for which I apologise, but I find this editor very difficult to drive.
New Value as Relational Constant
This is the formalisation of a relational constant or ‘relcon’, and relies on a previously defined function. Separate formalisations are provided for monadic and dyadic functions. Extending them to n-adic functions is left as an exercise.
Monadic
Given a function
f
with the type signatureTx->Ty
, the relconS
is defined as follows.Hs = { <X,Tx>, <Y,Ty> } Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty (ts = {<X,Tx,vx>, <Y,Ty,vy>} and f(Vx) = Vy)}Dyadic
Given a function
f
with the type signatureTx->Ty->Tz
, the relconS
is defined as follows.Hs = { <X,Tx>, <Y,Ty>, <Z,Tz>} Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty exists vz ∈ Tz (ts = {<X,Tx,vx>, <Y,Ty,vy, <Z,Tz,vz>} and f(Vx,Vy) = Vz)}For the relcon PLUS in App-A, types Tx, Ty and Tz are all INTEGER, and the function f is the scalar operator "+".
Note that a ‘relcon’ can be used as one argument to a join. Operations sometimes known as WHERE, EXTEND, UPDATE and DELETE are shorthands that may include such a combination.
Transitive Closure
This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end alue is all the possible paths through the graph.
Given a set
S
of tuples with the heading{<X,T>,<Y,T>}
for some typeT
, the successor relationS’
is defined as follows.Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
(exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T
exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>})
exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>})
ts’ = {<A,T,v1>, <B,T,v3>} )) }The transitive closure
T
is then defined asT = S1 U S2 U S3 … S∞
Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.
Note that the operation sometimes known as TCLOSE corresponds to this definition.
Extended Transitive Closure
This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.
In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.
Given a set S of tuples with the heading
{<A,T>,<B,T>,<C,Tv>}
for some typesT
andTv
, and a dyadic functionf
with the type signatureTv->Tv->Tv
, the successor relationS’
is defined as follows.Hs’ = Hs Bs’ = { ts’ : ts’ ∈ Bs or (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T exists w1 ∈ Tv exists w2 ∈ Tv exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>}) exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }The transitive closure
T
is then defined asT = S1 U S2 U S3 … S∞
Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.
Note that the function enables a cost to be calculated for each path. Generalised Transitive Closure as defined by D&D (RM VSS 5) requires an additional step of aggregation.
The following are updated and corrected versions of definitions posted earlier, plus Extended TC. There are formatting issues, for which I apologise, but I find this editor very difficult to drive.
New Value as Relational Constant
This is the formalisation of a relational constant or ‘relcon’, and relies on a previously defined function. Separate formalisations are provided for monadic and dyadic functions. Extending them to n-adic functions is left as an exercise.
Monadic
Given a function f
with the type signature Tx->Ty
, the relcon S
is defined as follows.
Hs = { <X,Tx>, <Y,Ty> } Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty (ts = {<X,Tx,vx>, <Y,Ty,vy>} and f(Vx) = Vy)}
Dyadic
Given a function f
with the type signature Tx->Ty->Tz
, the relcon S
is defined as follows.
Hs = { <X,Tx>, <Y,Ty>, <Z,Tz>} Bs = { ts : exists vx ∈ Tx exists vy ∈ Ty exists vz ∈ Tz (ts = {<X,Tx,vx>, <Y,Ty,vy, <Z,Tz,vz>} and f(Vx,Vy) = Vz)}
For the relcon PLUS in App-A, types Tx, Ty and Tz are all INTEGER, and the function f is the scalar operator "+".
Note that a ‘relcon’ can be used as one argument to a join. Operations sometimes known as WHERE, EXTEND, UPDATE and DELETE are shorthands that may include such a combination.
Transitive Closure
This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end alue is all the possible paths through the graph.
Given a set S
of tuples with the heading {<X,T>,<Y,T>}
for some type T
, the successor relation S’
is defined as follows.
Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
(exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T
exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>})
exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>})
ts’ = {<A,T,v1>, <B,T,v3>} )) }
The transitive closure T
is then defined as
T = S1 U S2 U S3 … S∞
Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.
Note that the operation sometimes known as TCLOSE corresponds to this definition.
Extended Transitive Closure
This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.
In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.
Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>}
for some types T
and Tv
, and a dyadic function f
with the type signature Tv->Tv->Tv
, the successor relation S’
is defined as follows.
Hs’ = Hs Bs’ = { ts’ : ts’ ∈ Bs or (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T exists w1 ∈ Tv exists w2 ∈ Tv exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>}) exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }
The transitive closure T
is then defined as
T = S1 U S2 U S3 … S∞
Note that this is a linear recurrence, not a predicate. Transitive closure cannot be defined by first order predicate logic.
Note that the function enables a cost to be calculated for each path. Generalised Transitive Closure as defined by D&D (RM VSS 5) requires an additional step of aggregation.
Quote from tobega on February 16, 2021, 2:58 pmNot that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATORHow do I use it to calculate the GTC?
Not that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOR
How do I use it to calculate the GTC?
Quote from dandl on February 16, 2021, 11:50 pmQuote from tobega on February 16, 2021, 2:58 pmNot that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATORHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
Quote from tobega on February 16, 2021, 2:58 pmNot that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATORHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
Quote from tobega on February 18, 2021, 1:36 pmQuote from dandl on February 16, 2021, 11:50 pmQuote from tobega on February 16, 2021, 2:58 pmNot that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATORHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
Quote from dandl on February 16, 2021, 11:50 pmQuote from tobega on February 16, 2021, 2:58 pmNot that there's anything wrong with these definitions but I am unsure how they help with the question in https://forum.thethirdmanifesto.com/forum/topic/what-do-set-based-operations-buy-us/?part=1
Let me rewrite my code as a successor function:
OPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER })RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ;RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT:(LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT);END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATOROPERATOR CONTENTS_SUCCESSOR( CCA RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER }) RETURNS RELATION { CONTAINER C_ID, CONTAINED C_ID, AMOUNT INTEGER } ; RETURN (WITH ((RENAME CCA {CONTAINED, AMOUNT} (CONTAINED AS CONTAINER, AMOUNT AS MULTIPLIER)) JOIN CCA) AS LEFT: (LEFT EXTEND(TOTAL AS MULTIPLIER*AMOUNT)){CONTAINER, CONTAINED, TOTAL} RENAME(TOTAL AS AMOUNT); END OPERATORHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
Quote from dandl on February 19, 2021, 4:54 amHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.
Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce result.
It is a choice you are free to make: power or safety. Both are valuable.
I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.
How do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.
Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce result.
It is a choice you are free to make: power or safety. Both are valuable.
I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.
Quote from tobega on February 21, 2021, 5:14 amQuote from dandl on February 19, 2021, 4:54 amHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.
Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce result.
It is a choice you are free to make: power or safety. Both are valuable.
I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.
I'm trying to learn how to do this in practice, so what are the algorithmic steps?
Since you mention that there are special requirements that seem to go beyond the formal definition given, such as I may not use a general programming language, what are those requirements? What does a safe successor function look like for the case of a container having amount of contained, how is it implemented? The function to combine amounts in the step is multiplication.
When I have a safe successor function what are the algorithmic steps of a safe ETCLOSE function? Without knowing all the steps it is impossible to verify that it is safe.
Quote from dandl on February 19, 2021, 4:54 amHow do I use it to calculate the GTC?
First you need someone to write an ETCLOSE function for you, and add it to TD. The ETCLOSE is just like TCLOSE except it has an additional feature.
The TCLOSE function adds tuples to a self-joinable relation representing additional nodes in the graph. Each new tuple has two parents. The extra feature in ETCLOSE allows attribute(s) in each new tuple value to be given value(s) calculated from its parents. The design of that language feature I leave to the implementer, but it might well be a function on tuples. The JOIN in your code is not needed, as it is part of the ETCLOSE.
The final step to GTC is by aggregation over the result of the ETCLOSE.
Note that TCLOSE can be implemented using either recursion or iteration with mutable state, but it needs neither. TCLOSE requires only summing a series, as shown by my formal treatment. ETCLOSE ditto.
This is the series I am interested in.
So when you write
T = S1 U S2 U S3 … S∞
Is S1 the Successor(S0) and S2 the Successor(S1) and so on? You end when you get an empty set? And you just union them together, and then what? Aggregate over something by some function? I have already in my successor function applied the transform to multiply the parent count by the child count to unify the two counts.
My formal definition shows that (a) the successor function can be first order (b) the series has a finite sum. That means that ETCLOSE is safe, it is guaranteed to terminate and produce a result. Any aggregation following it is also safe. That is the strength of RA queries: they are guaranteed safe.
Your successor function is expressed in a Turing Complete programming language, therefore it is not safe. There is no guarantee it will terminate or produce result.
It is a choice you are free to make: power or safety. Both are valuable.
I actually implemented while, which is like SQL CTE RECURSIVE, more powerful than ETCLOSE and still safe. The terminating condition is no new tuples for the union. I don't have a formal definition for that.
I'm trying to learn how to do this in practice, so what are the algorithmic steps?
Since you mention that there are special requirements that seem to go beyond the formal definition given, such as I may not use a general programming language, what are those requirements? What does a safe successor function look like for the case of a container having amount of contained, how is it implemented? The function to combine amounts in the step is multiplication.
When I have a safe successor function what are the algorithmic steps of a safe ETCLOSE function? Without knowing all the steps it is impossible to verify that it is safe.
Quote from dandl on February 21, 2021, 6:34 amThe name and definition for while come from the Alice book, but in practice it works the same as SQL CTE RECURSIVE. It's simple enough:
- Start with a seed (relational) value
- Evaluate a relational expression, with the seed as an argument
- Union the result with the seed
- Repeat until no new tuples.
Obviously the implementation will require a GP language (less than 10 lines of code in Andl), but queries just use relational expressions.
The name and definition for while come from the Alice book, but in practice it works the same as SQL CTE RECURSIVE. It's simple enough:
- Start with a seed (relational) value
- Evaluate a relational expression, with the seed as an argument
- Union the result with the seed
- Repeat until no new tuples.
Obviously the implementation will require a GP language (less than 10 lines of code in Andl), but queries just use relational expressions.
Quote from tobega on February 22, 2021, 9:17 amI think we also need to ensure that values are not deduplicated. Consider:
Container Contains quantity red blue 1 red yellow 1 blue green 2 yellow green 2 In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
In the same way, the outputs from successive applications of successor functions could contain duplications where the different path to the identical tuple is no longer apparent.
Come to think of it, the successor function must always have the current seed joined with the original relation (which represents the number one, or one step), otherwise we will skip levels.
Container Contained quantity red blue 1 blue green 10 blue yellow 2 yellow green 5 So in the first application you would have "red, green, 10", "red, yellow,2" and "blue,green, 5". These must then be joined with the original for the next successor, giving "red, green, 10" again, so care must be taken that we when we "union" the first and second results end up with 20 greens in a red, not just 10.
I think we also need to ensure that values are not deduplicated. Consider:
Container | Contains | quantity |
red | blue | 1 |
red | yellow | 1 |
blue | green | 2 |
yellow | green | 2 |
In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
In the same way, the outputs from successive applications of successor functions could contain duplications where the different path to the identical tuple is no longer apparent.
Come to think of it, the successor function must always have the current seed joined with the original relation (which represents the number one, or one step), otherwise we will skip levels.
Container | Contained | quantity |
red | blue | 1 |
blue | green | 10 |
blue | yellow | 2 |
yellow | green | 5 |
So in the first application you would have "red, green, 10", "red, yellow,2" and "blue,green, 5". These must then be joined with the original for the next successor, giving "red, green, 10" again, so care must be taken that we when we "union" the first and second results end up with 20 greens in a red, not just 10.
Quote from Dave Voorhis on February 22, 2021, 10:11 amQuote from tobega on February 22, 2021, 9:17 amI think we also need to ensure that values are not deduplicated. Consider:
Container Contains quantity red blue 1 red yellow 1 blue green 2 yellow green 2 In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.
Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates. Hooray for consistency!
If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.
Quote from tobega on February 22, 2021, 9:17 amI think we also need to ensure that values are not deduplicated. Consider:
Container Contains quantity red blue 1 red yellow 1 blue green 2 yellow green 2 In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.
Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates. Hooray for consistency!
If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.
Quote from tobega on February 22, 2021, 12:59 pmQuote from Dave Voorhis on February 22, 2021, 10:11 amQuote from tobega on February 22, 2021, 9:17 amI think we also need to ensure that values are not deduplicated. Consider:
Container Contains quantity red blue 1 red yellow 1 blue green 2 yellow green 2 In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.
Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates. Hooray for consistency!
If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.
Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?
Quote from Dave Voorhis on February 22, 2021, 10:11 amQuote from tobega on February 22, 2021, 9:17 amI think we also need to ensure that values are not deduplicated. Consider:
Container Contains quantity red blue 1 red yellow 1 blue green 2 yellow green 2 In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.
Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.
Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates. Hooray for consistency!
If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.
Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?