The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

Formal definitions for Relation Constant, Transitive Closure, Extended Transitive Closure

Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2021, 2:03 pm
Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

Which definition are you reading? I am reading the one proposed by dandl at the start of the thread, but maybe I am confusing what is meant and this isn't "explosion of parts"?

Copying here:

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Quote from tobega on February 22, 2021, 3:04 pm
Quote from Dave Voorhis on February 22, 2021, 2:03 pm
Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

Which definition are you reading? I am reading the one proposed by dandl at the start of the thread, but maybe I am confusing what is meant and this isn't "explosion of parts"?

Copying here:

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Yes, that's the one I was referring to, which appears reliant on a definition of UNION (U) that presumes deduplication by definition of union and of its operands which are relations, and by the definition of relation per the successor relation.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2021, 3:54 pm
Quote from tobega on February 22, 2021, 3:04 pm
Quote from Dave Voorhis on February 22, 2021, 2:03 pm
Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

Which definition are you reading? I am reading the one proposed by dandl at the start of the thread, but maybe I am confusing what is meant and this isn't "explosion of parts"?

Copying here:

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Yes, that's the one I was referring to, which appears reliant on a definition of UNION (U) that presumes deduplication by definition of union and of its operands which are relations, and by the definition of relation per the successor relation.

Exactly, so if I follow that, which I was told would help me implement a correct explosion of parts calculation, I don't get a correct result, because the specification of the successor function has no field to identify the difference between the 10 green containers inside the red via the blue and the 10 green containers transiently in the red via the blue and yellow and so will result in calculating merely 10 greens inside the red rather than the 20 it should be. Nor does it have any notion to separate the two greens reached via the yellow from the two greens reached via the blue, so I will incorrectly in that case get two greens inside the red instead of four.

 

Quote from tobega on February 22, 2021, 4:09 pm
Quote from Dave Voorhis on February 22, 2021, 3:54 pm
Quote from tobega on February 22, 2021, 3:04 pm
Quote from Dave Voorhis on February 22, 2021, 2:03 pm
Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

Which definition are you reading? I am reading the one proposed by dandl at the start of the thread, but maybe I am confusing what is meant and this isn't "explosion of parts"?

Copying here:

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Yes, that's the one I was referring to, which appears reliant on a definition of UNION (U) that presumes deduplication by definition of union and of its operands which are relations, and by the definition of relation per the successor relation.

Exactly, so if I follow that, which I was told would help me implement a correct explosion of parts calculation, I don't get a correct result, because the specification of the successor function has no field to identify the difference between the 10 green containers inside the red via the blue and the 10 green containers transiently in the red via the blue and yellow and so will result in calculating merely 10 greens inside the red rather than the 20 it should be. Nor does it have any notion to separate the two greens reached via the yellow from the two greens reached via the blue, so I will incorrectly in that case get two greens inside the red instead of four.

Isn't that exactly the same issue as -- for example -- projecting on {PartID, Quantity} and the same PartID has the same quantity on two different days?

Therefore, isn't it up to you -- and not up to the definition of projection (and thus, not up to the definition of extended transitive closure's successor function, or any other relational operator) -- to recognise the need to project on {Timestamp, PartID, Quantity} so that doesn't happen?

This is, I think, different from the requirement to use a generated sequence number to prevent deduplication in Rel's user-defined aggregation. That's because it always projects out the aggregate expression, so undesirably collapsed duplicates would be inevitable and otherwise unavoidable. I don't think that's the case with @dandl's extended transitive closure.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org
Quote from Dave Voorhis on February 22, 2021, 4:31 pm
Quote from tobega on February 22, 2021, 4:09 pm
Quote from Dave Voorhis on February 22, 2021, 3:54 pm
Quote from tobega on February 22, 2021, 3:04 pm
Quote from Dave Voorhis on February 22, 2021, 2:03 pm
Quote from tobega on February 22, 2021, 1:53 pm
Quote from Dave Voorhis on February 22, 2021, 1:36 pm
Quote from tobega on February 22, 2021, 12:59 pm
Quote from Dave Voorhis on February 22, 2021, 10:11 am
Quote from tobega on February 22, 2021, 9:17 am

I think we also need to ensure that values are not deduplicated. Consider:

Container Contains quantity
red blue 1
red yellow 1
blue green 2
yellow green 2

In the successor function you would end up with two records "red, green, 2" which will get deduplicated unless you are careful.

Deduplication by UNION (and all relational operators) is what you expect. Preserving duplicates (by making them non-duplicates) is always a special case.

Of course, SQL is the opposite; preserving duplicates is what you expect and deduplication is a special case. Except for UNION, which deduplicates and needs to be UNION ALL to preserve duplicates.  Hooray for consistency!

If you regard databases as mere containers, perhaps preserve-duplicates-by-default is more intuitive. If you regard a database as a collection of fact assertions, then preserve-duplicates-by-default is counter-intuitive.

Yes. What I am talking about is that in implementing the algorithm to correctly calculate an extended transitive closure, an explosion of parts, you must add something more than what is shown by the formal definition proposed in this thread. You yourself mentioned using a temporarilty generated id in the Rel implementation if I'm not mistaken?

Yes, to implement generalised aggregation.

But I'm not sure a formal definition of transitive closure need consider it. It's the user's implementation concern, no?

Well, if the user strictly follows the definition given for the successor function, bugs will ensue because no mention is made of this.

And if the user uses a utility that is implemented as the union of the results of successor functions before aggregating, as specified in the proposal, then it doesn't help the user to do a proper merging of values in the successor function alone, because the transitive closure utility would not distinguish between values from different generations that should be aggregated but get deduplicated because they happen to be the same.

Of course, everything is the user's problem in the end, but we could try to make the specifications correct so that the user stands a chance.

Does mention of deduplication for the definition of extended transitive closure not follow implicitly from its definition?

I mean, we don't need to mention deduplication in, say, projection, do we?

Or am I missing a point here?

Which definition are you reading? I am reading the one proposed by dandl at the start of the thread, but maybe I am confusing what is meant and this isn't "explosion of parts"?

Copying here:

Extended Transitive Closure

This formalisation defines a recurrence relation consisting of a starting value and the elements of a sequence. The transitive closure is the union of that sequence, which can be shown to reach a fix-point termination. The starting point (‘seed’) represents known edges in a directed graph; the end value is all the possible paths through the graph.

In this case each tuple is associated with a value, and this definition relies on some previously defined function f that takes values of that type as its argument.

Given a set S of tuples with the heading {<A,T>,<B,T>,<C,Tv>} for some types T and Tv, and a dyadic function f with the type signature Tv->Tv->Tv, the successor relation S’ is defined as follows.

Hs’ = Hs
Bs’ = { ts’ : ts’ ∈ Bs or
  (exists v1 ∈ T exists v2 ∈ T exists v3 ∈ T exists v4 ∈ T 
   exists w1 ∈ Tv exists w2 ∈ Tv
   exists ts1 ∈ Bs (ts1 = {<A,T,v1>, <B,T,v2>, <C,Tv,w1>})
   exists ts2 ∈ Bs (ts2 = {<A,T,v2>, <B,T,v3>, <C,Tv,w2>}) 
   ts’ = {<A,T,v1>, <B,T,v3>, <C,Tv,f(w1,w2)} )) }

The transitive closure T is then defined as

T = S1 U S2 U S3 … S∞

Yes, that's the one I was referring to, which appears reliant on a definition of UNION (U) that presumes deduplication by definition of union and of its operands which are relations, and by the definition of relation per the successor relation.

Exactly, so if I follow that, which I was told would help me implement a correct explosion of parts calculation, I don't get a correct result, because the specification of the successor function has no field to identify the difference between the 10 green containers inside the red via the blue and the 10 green containers transiently in the red via the blue and yellow and so will result in calculating merely 10 greens inside the red rather than the 20 it should be. Nor does it have any notion to separate the two greens reached via the yellow from the two greens reached via the blue, so I will incorrectly in that case get two greens inside the red instead of four.

Isn't that exactly the same issue as -- for example -- projecting on {PartID, Quantity} and the same PartID has the same quantity on two different days?

Therefore, isn't it up to you -- and not up to the definition of projection (and thus, not up to the definition of extended transitive closure's successor function, or any other relational operator) -- to recognise the need to project on {Timestamp, PartID, Quantity} so that doesn't happen?

This is, I think, different from the requirement to use a generated sequence number to prevent deduplication in Rel's user-defined aggregation. That's because it always projects out the aggregate expression, so undesirably collapsed duplicates would be inevitable and otherwise unavoidable. I don't think that's the case with @dandl's extended transitive closure.

¯\_(ツ)_/¯

Thank you for raising this interesting issue, which I had overlooked. But no, it has nothing to do with de-duplication or the UNION. The problem can arise in a network of just 4 nodes.

The earlier TCLOSE definition says: if there are links A->B and B->C then there is a link A->C and we can add that as a new tuple. If there is also A->D and D->C then there is A->C but already knew that so nothing to do.

This ETCLOSE definition says: if there are links A->B weight W1 and B->C (W2) then there is a link A->C and we can add that as a new tuple with weight f(W1,W2). If there is also A->D (W3) and D->C (W4) the there is A->C but with weight f(W3,W4) and we can't handle that. Bummer!

But the predicate for this relation says: there is a path from A to B with weight W. It does not allow multiple paths with different weights. As a relvar it should have a key on the linking nodes to prevent adding duplicate paths. It must never rely on weights to de-duplicate.

If multiple paths are possible, the predicate must allow for that by adding another key attribute. I think that's what Dave is saying too.

What I don't see right now is exactly how to write a definition to generate that new key attribute for a new tuple, but I don't think it will be too hard. I'll get back to you on that one.

 

Andl - A New Database Language - andl.org
Quote from dandl on February 23, 2021, 1:07 am

Thank you for raising this interesting issue, which I had overlooked. But no, it has nothing to do with de-duplication or the UNION. The problem can arise in a network of just 4 nodes.

The earlier TCLOSE definition says: if there are links A->B and B->C then there is a link A->C and we can add that as a new tuple. If there is also A->D and D->C then there is A->C but already knew that so nothing to do.

This ETCLOSE definition says: if there are links A->B weight W1 and B->C (W2) then there is a link A->C and we can add that as a new tuple with weight f(W1,W2). If there is also A->D (W3) and D->C (W4) the there is A->C but with weight f(W3,W4) and we can't handle that. Bummer!

But the predicate for this relation says: there is a path from A to B with weight W. It does not allow multiple paths with different weights. As a relvar it should have a key on the linking nodes to prevent adding duplicate paths. It must never rely on weights to de-duplicate.

If multiple paths are possible, the predicate must allow for that by adding another key attribute. I think that's what Dave is saying too.

Yes, essentially.  I'm also saying that if the additional key attribute can be user-specified, then there's no need for the definition of extended transitive closure to include it because it's entirely a user problem. (The same applies to, for example, projection.)

On the other hand, if applying the definition of extended transitive closure forces deduplication without otherwise being able to specify another key attribute (except in the system implementation of extended transitive closure), then it perhaps belongs in the definition.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org