Update on "C# is a D" project
Quote from dandl on May 15, 2020, 1:32 amQuote from Erwin on May 14, 2020, 3:52 pmQuote from dandl on May 14, 2020, 1:47 pmQuote from Erwin on May 14, 2020, 11:36 amQuote from Dave Voorhis on May 14, 2020, 10:08 amQuote from Erwin on May 14, 2020, 8:55 amQuote from Dave Voorhis on May 10, 2020, 9:07 amQuote from dandl on May 10, 2020, 8:04 am
- Generalised aggregation as per RM Pre 6 is straightforward. Andl has this, but I'm not aware of any other D that has.
Rel does. See https://reldb.org/c/wp-content/uploads/2016/06/User-Defined-Aggregate-Operators-in-Tutorial-D-and-Rel.pdf
SIRA_PRISE has too. See http://shark.armchair.mb.ca/~erwin/languageandgrammar_0105.html
Doesn't SIRA_PRISE have generalised transitive closure, too?
Yup.
<ROgtclose
> := GTCLOSE(<
RelationalExpression>,(<
AttributeNamePair>+,),(<
GTCloseDef>*))
And in this case there is indeed justification for including it since there's not really a way to take the "use a recursive operator" path.
Good to hear. That isn't really enough to work out what it can do: is it as general as while? (Which, by the way, is pretty much like SQL CTE RECURSIVE.)
And there's also this little curious coincidence :
<ROTransform
> := TRANSFORM(<
RelationalExpression>,(<
TransformDef>*,))
<TransformDef> := <
AttributeName>[(<
Expression>)]
Acknowledged. That one I did know about, and that influenced the choice of name. It may not be as 'pure' as the RA, but it sure can save on having to combine other operators to achieve a simple outcome.
The attribute name pairs control the tuple matching (which is on attribute value equality exclusively). The GTCloseDefs control the computations for building "new" tuples in the closure. Since while allows programming, so to speak, (and therefore theoretically supports any arbitrary predicate to be applied for tuples to match) it's unlikely to be as powerful as while, but otoh the programming in while makes it non-declarative and I've always wanted to stay away from that.
Can you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
I'm baffled you seem to think that what is merely a syntactic shorthand for combinations of primitive operators of the RA, can in any way be "not as pure as the RA". I have no idea what your agenda is with that choice of words. It's RA. Plain and simple.
Not intended to be derogatory, just suggesting that a combination is less 'pure' than individual operators. I agree, thinking of it as simply a shorthand is a good way to look at it. In any case, it's really useful.
Quote from Erwin on May 14, 2020, 3:52 pmQuote from dandl on May 14, 2020, 1:47 pmQuote from Erwin on May 14, 2020, 11:36 amQuote from Dave Voorhis on May 14, 2020, 10:08 amQuote from Erwin on May 14, 2020, 8:55 amQuote from Dave Voorhis on May 10, 2020, 9:07 amQuote from dandl on May 10, 2020, 8:04 am
- Generalised aggregation as per RM Pre 6 is straightforward. Andl has this, but I'm not aware of any other D that has.
Rel does. See https://reldb.org/c/wp-content/uploads/2016/06/User-Defined-Aggregate-Operators-in-Tutorial-D-and-Rel.pdf
SIRA_PRISE has too. See http://shark.armchair.mb.ca/~erwin/languageandgrammar_0105.html
Doesn't SIRA_PRISE have generalised transitive closure, too?
Yup.
<ROgtclose
> := GTCLOSE(<
RelationalExpression>,(<
AttributeNamePair>+,),(<
GTCloseDef>*))
And in this case there is indeed justification for including it since there's not really a way to take the "use a recursive operator" path.
Good to hear. That isn't really enough to work out what it can do: is it as general as while? (Which, by the way, is pretty much like SQL CTE RECURSIVE.)
And there's also this little curious coincidence :
<ROTransform
> := TRANSFORM(<
RelationalExpression>,(<
TransformDef>*,))
<TransformDef> := <
AttributeName>[(<
Expression>)]
Acknowledged. That one I did know about, and that influenced the choice of name. It may not be as 'pure' as the RA, but it sure can save on having to combine other operators to achieve a simple outcome.
The attribute name pairs control the tuple matching (which is on attribute value equality exclusively). The GTCloseDefs control the computations for building "new" tuples in the closure. Since while allows programming, so to speak, (and therefore theoretically supports any arbitrary predicate to be applied for tuples to match) it's unlikely to be as powerful as while, but otoh the programming in while makes it non-declarative and I've always wanted to stay away from that.
Can you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
I'm baffled you seem to think that what is merely a syntactic shorthand for combinations of primitive operators of the RA, can in any way be "not as pure as the RA". I have no idea what your agenda is with that choice of words. It's RA. Plain and simple.
Not intended to be derogatory, just suggesting that a combination is less 'pure' than individual operators. I agree, thinking of it as simply a shorthand is a good way to look at it. In any case, it's really useful.
Quote from dandl on May 15, 2020, 2:02 amSo it seems that most modern GP languages can satisfy the requirements of TTM/D, with a greater or lesser degree of type safety, with or without a compiler tweak, pre-processor or runtime checks. But would I use it?
One thing I learned from writing samples is that creating individual types for each step of a relational expression is clunky and counter-intuitive. Generics provide a high degree of type safety, but it's really the headings you want to see and check, and they (a) get stuck in the type declaration and (b) can't currently be checked by the compiler. So is there an alternative? TTM is not that prescriptive about how the RA should work, in terms of type checking. It should be enough to have strong typing of:
- data sources (via constructors)
- tuple access in select, extend/transform and aggregate (via getters)
- relational assignment (more constructors and getters).
The intermediate steps are either non-converting (union, while) or heading-driven (project, join). IOW it should be possible to achieve the same result by declaring types only for those tuples/relations that actually need a constructor or getter, and relying on headings and base class operations for everything else.
Personally, I don't see a problem in having strong type checking at each end of a pipeline, and leaving heading checking to the runtime. The runtime can also use reflection to check the heading against declared getters when available. The goal would be code that shows literal headings and (mostly) no type annotations in what you write.
So it seems that most modern GP languages can satisfy the requirements of TTM/D, with a greater or lesser degree of type safety, with or without a compiler tweak, pre-processor or runtime checks. But would I use it?
One thing I learned from writing samples is that creating individual types for each step of a relational expression is clunky and counter-intuitive. Generics provide a high degree of type safety, but it's really the headings you want to see and check, and they (a) get stuck in the type declaration and (b) can't currently be checked by the compiler. So is there an alternative? TTM is not that prescriptive about how the RA should work, in terms of type checking. It should be enough to have strong typing of:
- data sources (via constructors)
- tuple access in select, extend/transform and aggregate (via getters)
- relational assignment (more constructors and getters).
The intermediate steps are either non-converting (union, while) or heading-driven (project, join). IOW it should be possible to achieve the same result by declaring types only for those tuples/relations that actually need a constructor or getter, and relying on headings and base class operations for everything else.
Personally, I don't see a problem in having strong type checking at each end of a pipeline, and leaving heading checking to the runtime. The runtime can also use reflection to check the heading against declared getters when available. The goal would be code that shows literal headings and (mostly) no type annotations in what you write.
Quote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics. The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE. Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
Quote from dandl on May 15, 2020, 1:32 am
Can you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics. The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE. Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
Quote from dandl on May 16, 2020, 1:15 amQuote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I'm not sure about the use of those terms or other literature. What I see is that GTC involves two steps: a first (multiplicative) computation executed on the way down the tree expanding each path, and a second (aggregating) computation to build the final result. The second part is easy, it's just aggregation. The first part is beyond ordinary TC, because it requires the path-building step to be exposed so that a computation can be added. Here is my code again:
WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());
- The seed already has the final answer for immediate contained parts, before exploding.
- The Rename/Join just pull in outside data to get QTY for the current Minor tuple.
- The Transform does the work of adding the new step in a path with the multiplicative computation.
- The final aggregate does the second computation.
There is nothing extraneous, everything here is a required part of GTC. But the criticism is that while is too powerful: it is effectively Turing complete. It should be possible to open out the mechanism of a simple TC to include a simple transform step, just like shown here, if you don't want the full while. I found while implementation to be easy and short, so that's why I preferred it this way
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
I disagree. The order of calculation is deterministic with respect to each path: top down.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics. The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
No, may lack of safety is a direct product of the power. You can easily construct non-terminating (infinite) sequences using while.
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE. Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
So does that lead to a full solution?
Quote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I'm not sure about the use of those terms or other literature. What I see is that GTC involves two steps: a first (multiplicative) computation executed on the way down the tree expanding each path, and a second (aggregating) computation to build the final result. The second part is easy, it's just aggregation. The first part is beyond ordinary TC, because it requires the path-building step to be exposed so that a computation can be added. Here is my code again:
WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());
- The seed already has the final answer for immediate contained parts, before exploding.
- The Rename/Join just pull in outside data to get QTY for the current Minor tuple.
- The Transform does the work of adding the new step in a path with the multiplicative computation.
- The final aggregate does the second computation.
There is nothing extraneous, everything here is a required part of GTC. But the criticism is that while is too powerful: it is effectively Turing complete. It should be possible to open out the mechanism of a simple TC to include a simple transform step, just like shown here, if you don't want the full while. I found while implementation to be easy and short, so that's why I preferred it this way
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
I disagree. The order of calculation is deterministic with respect to each path: top down.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics. The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
No, may lack of safety is a direct product of the power. You can easily construct non-terminating (infinite) sequences using while.
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE. Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
So does that lead to a full solution?
Quote from AntC on May 16, 2020, 9:05 amQuote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter).
I can't help feel your approach (below) is not legit, in general. Firstly, if you want to distinguish **direct** inclusion vs depth-2 inclusion vs depth-3 inclusion, you don't need Transitive closure at all (generalised nor garden variety). Just explicitly JOIN the number of times to get the required depth.
More important: the hierarchy is not necessarily a well-behaved tree in general. Suppose P3 is not a part but an assembly (nut P4 + bolt P5). And suppose P1 needs two bolts P5 without nuts as well as the P3s. So the overall network from P1 contains multiple 'diamond' configurations. In general the network mightn't be a tree at all: it could contain cycles (suppose it's an airline route: bidirectional on every segment; and famously on some routing algorithms it's more efficient to go back on yourself for some segments -- for a suitably airline-twisted measure of 'efficient').
I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
No. Because the authors of that literature were/are mathematicians, and they're solving a generalised problem with a generalised solution that produces a determinate result from any (no matter how maniacal) digraph. You have a non-general solution that is going to fall apart given suitably maniacal input.
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
That looks like it's given something that isn't a relation. Is that attribute in
{ }
a RVA? (It looks suspiciously like a non-relational 'repeating group'. You say it's an ordered_list -- oh dear!) What is/are the attribute names at the outermost level and the inner level(s)?and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
No TTM's is the opposite of reckless: it's well-behaved and determinate. It comes up with the same answer irrespective of which routings an implementation uses. Not (needing to) guarantee the order is precisely the point. Because the semantics are that its implementation must keep iterating until it reaches a fixed point of no further tuples generated. That's mathematically sound; I might say the only mathematically sound way to specify the behaviour for an operation. It gets a determinate result out of a network with cycles. The result is a relation -- i.e. a set -- not involving 'ordered lists'.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics.
That's what TTM does deliver: reaching a fixed point is the only "safe" interpretation. If the solution needs the user to think about depth/pathways, then transitive closure is the wrong tool. Whatever you've produced is not a transitive closure. So it might be meeting some need. Call it something else.
The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE.
Please show how it goes for airline routings. Or any network with cycles. Beware cycles are reasonably common in process manufacturing settings. I've done some implementations at dairy factories in NZ: you can't control the water content/chemical properties of the incoming materials: rain/grass/cows, innit. Producing milk powder might involve several cycles through the dryer, mixing fresh milk with already-partially-concentrated milk. No discrete nuts and bolts to track through a network of parts explosion.
We might need to know there's a batch of milk in which 25% has gone three cycles. We don't/can't track which particular molecules in the tank.
Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
Quote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter).
I can't help feel your approach (below) is not legit, in general. Firstly, if you want to distinguish **direct** inclusion vs depth-2 inclusion vs depth-3 inclusion, you don't need Transitive closure at all (generalised nor garden variety). Just explicitly JOIN the number of times to get the required depth.
More important: the hierarchy is not necessarily a well-behaved tree in general. Suppose P3 is not a part but an assembly (nut P4 + bolt P5). And suppose P1 needs two bolts P5 without nuts as well as the P3s. So the overall network from P1 contains multiple 'diamond' configurations. In general the network mightn't be a tree at all: it could contain cycles (suppose it's an airline route: bidirectional on every segment; and famously on some routing algorithms it's more efficient to go back on yourself for some segments -- for a suitably airline-twisted measure of 'efficient').
I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
No. Because the authors of that literature were/are mathematicians, and they're solving a generalised problem with a generalised solution that produces a determinate result from any (no matter how maniacal) digraph. You have a non-general solution that is going to fall apart given suitably maniacal input.
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
That looks like it's given something that isn't a relation. Is that attribute in { }
a RVA? (It looks suspiciously like a non-relational 'repeating group'. You say it's an ordered_list -- oh dear!) What is/are the attribute names at the outermost level and the inner level(s)?
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
TTM's proposal might be too reckless. They can never guarantee that the order in which concatenations and aggregations will be applied (which is a function of happenstance of the order in which the tuples will be ran into by the internal iteration algorithms, that order itself being a function of the randomness of physical implementation), will truly be the right match for the semantics of the business problem to which their GTCLOSE is applied.
No TTM's is the opposite of reckless: it's well-behaved and determinate. It comes up with the same answer irrespective of which routings an implementation uses. Not (needing to) guarantee the order is precisely the point. Because the semantics are that its implementation must keep iterating until it reaches a fixed point of no further tuples generated. That's mathematically sound; I might say the only mathematically sound way to specify the behaviour for an operation. It gets a determinate result out of a network with cycles. The result is a relation -- i.e. a set -- not involving 'ordered lists'.
You will find "my" GTCLOSE requiring "too much organizing" by the user. I find it a reasonable measure to help the user ascertain the semantics of what the query does are really the semantics that same user wants to see answered. Our mileages vary. Which doesn't surprise me given that we're [almost] at as extreme opposite ends of the earth as it gets.
And when it comes to "safety" : there ain't none such in either approach because the "safety" you'd really want is exactly at the level of semantics.
That's what TTM does deliver: reaching a fixed point is the only "safe" interpretation. If the solution needs the user to think about depth/pathways, then transitive closure is the wrong tool. Whatever you've produced is not a transitive closure. So it might be meeting some need. Call it something else.
The kind of programmer that comes a gross to the dozen gets this sort of stuff wrong, period. I believe your (and Alice's) "coder" approach doesn't differ zilch in this respect from my "more limited" approach that at least manages to stay "declarative". In the final aftermath, both require the coder to keep an EXTREMELY VIGILANT eye on "what it is that he's doing".
I can show some detail of how "maintaining the identity of the path" goes in "my" GTCLOSE.
Please show how it goes for airline routings. Or any network with cycles. Beware cycles are reasonably common in process manufacturing settings. I've done some implementations at dairy factories in NZ: you can't control the water content/chemical properties of the incoming materials: rain/grass/cows, innit. Producing milk powder might involve several cycles through the dryer, mixing fresh milk with already-partially-concentrated milk. No discrete nuts and bolts to track through a network of parts explosion.
We might need to know there's a batch of milk in which 25% has gone three cycles. We don't/can't track which particular molecules in the tank.
Paths are relations of {ORDINAL, POINT_IN_PATH} attributes where there is always an ORDINAL 1 and ditto for all the ordinals up to the cardinality of the relation. "Appending" one path to another, is relational UNION, ***after*** the ordinals of the path being appended have been "shifted up" by the number of points in the path it is being appended to. That's this particular part of the GTCLOSE spec in my horrific example :
FULLPATH(UNION(__L__FULLPATH,RENAME(PROJECT(EXTEND(__R__FULLPATH,STEP2(SUB(PLUS(LENGTH(__L__FULLPATH),STEP),INT(1)))),(RELVARNAME,RECORDTYPENAME,STEP2,STEPLOCATTRIBUTES)),(STEP2,STEP))))
There could be a shorthand to fit the pattern, but the shorthand definer is short of hands to do so ...
And obtaining the value 10 from the values 5 and 2 would be a matter of specifying
QUANTITY(MULT(__L__QUANTITY,__R__QUANTITY))
Computations are applied after the operator has decided two tuples will match for the production of a new one, and the syntax needs to offer a way to reference both the one and the other of those two tuples. That's __L__ and __R__.
Quote from Erwin on May 16, 2020, 6:15 pmQuote from dandl on May 16, 2020, 1:15 amQuote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I'm not sure about the use of those terms or other literature. What I see is that GTC involves two steps: a first (multiplicative) computation executed on the way down the tree expanding each path, and a second (aggregating) computation to build the final result. The second part is easy, it's just aggregation. The first part is beyond ordinary TC, because it requires the path-building step to be exposed so that a computation can be added. Here is my code again:
WriteLine("While");var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty"var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty"var exp = seed.While(t => t.Rename<TupMzA>() // "Major", "zmatch", "AggQty".Join<TupzMQ, TupMMQA>(zmq).Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty)));WriteLine(exp.Format());WriteLine("P1 -> P5");WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5").Aggregate<TupMMT,int>((t,a) => a + t.AggQty).Format());WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());
- The seed already has the final answer for immediate contained parts, before exploding.
- The Rename/Join just pull in outside data to get QTY for the current Minor tuple.
- The Transform does the work of adding the new step in a path with the multiplicative computation.
- The final aggregate does the second computation.
There is nothing extraneous, everything here is a required part of GTC. But the criticism is that while is too powerful: it is effectively Turing complete. It should be possible to open out the mechanism of a simple TC to include a simple transform step, just like shown here, if you don't want the full while. I found while implementation to be easy and short, so that's why I preferred it this way
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
Quote from dandl on May 16, 2020, 1:15 amQuote from Erwin on May 15, 2020, 7:20 pmQuote from dandl on May 15, 2020, 1:32 amCan you show the code for the GTC example form DTATRM p238?
Yes, I agree that while can be abused and the implementation I have is probably Turing complete (therefore unsafe), I would have thought there is a middle ground, of reasonable queries that GTC cannot answer and a 'safe' while could. Alice has a bit on that, but it's fairly hard to follow.
The good news is I now understand this business of "concatenation operator" and "aggregation operator" better than I ever did before.
The bad news is, no I can't, because what the 'G' stands for in my GTCLOSE comes nowhere near what the 'G' stands for in the concerned VSS.
I explain myself.
In the example, there are two distinct reasons why P3 parts are needed to build a P1 : because two P3 parts are needed for ***direct*** inclusion and because ten (5*2) P3 parts are needed to get the five P2 parts that are also needed (I might have gotten the numbers wrong, but that shouldn't matter). I believe the TTM authors defined "concatenation" and "aggregation" ***because*** they wanted a relatively short way to get to the number 12 that is the answer to "how many P3 parts are needed to build a P1 ?". Or they did so because all the literature they could find on the subject said this was the way to go and the authors of that literature were themselves suffering just as much from the same get-the-job-done syndrome.
I'm not sure about the use of those terms or other literature. What I see is that GTC involves two steps: a first (multiplicative) computation executed on the way down the tree expanding each path, and a second (aggregating) computation to build the final result. The second part is easy, it's just aggregation. The first part is beyond ordinary TC, because it requires the path-building step to be exposed so that a computation can be added. Here is my code again:
WriteLine("While");var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty"var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty"var exp = seed.While(t => t.Rename<TupMzA>() // "Major", "zmatch", "AggQty".Join<TupzMQ, TupMMQA>(zmq).Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty)));WriteLine(exp.Format());WriteLine("P1 -> P5");WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5").Aggregate<TupMMT,int>((t,a) => a + t.AggQty).Format());WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());WriteLine("While"); var seed = MMQData.MMQ.Extend<TupMMQA>(t => t.Qty); // "Major", "Minor", "Qty", "AggQty" var zmq = MMQData.MMQ.Rename<TupzMQ>(); // "zmatch", "Minor", "Qty" var exp = seed.While(t => t .Rename<TupMzA>() // "Major", "zmatch", "AggQty" .Join<TupzMQ, TupMMQA>(zmq) .Transform<TupMMQA>(tt => TupMMQA.Create(tt.Major, tt.Minor, 0, tt.AggQty * tt.Qty))); WriteLine(exp.Format()); WriteLine("P1 -> P5"); WriteLine(exp.Select(t => t.Major == "P1" && t.Minor == "P5") .Aggregate<TupMMT,int>((t,a) => a + t.AggQty) .Format());
- The seed already has the final answer for immediate contained parts, before exploding.
- The Rename/Join just pull in outside data to get QTY for the current Minor tuple.
- The Transform does the work of adding the new step in a path with the multiplicative computation.
- The final aggregate does the second computation.
There is nothing extraneous, everything here is a required part of GTC. But the criticism is that while is too powerful: it is effectively Turing complete. It should be possible to open out the mechanism of a simple TC to include a simple transform step, just like shown here, if you don't want the full while. I found while implementation to be easy and short, so that's why I preferred it this way
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
Quote from dandl on May 17, 2020, 1:48 amI now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
But there is no argument about that. Logically there are the two steps: expand the path, performing a multiplicative computation along the path; then aggregate the result as desired. That's precisely how it's done with while. If it's a priority to have a single GTC operator, use the 'generic relational operator' capability of the language to build that out of the two parts.
The main purpose of TC, GTC and while is inflation: adding new tuples to a relation according to some rule or computation. This is an inherent property of Datalog, but uncommon in other query languages. I guess a good part of the reason is that apart from a few specific instances, there really isn't much call. I can think of family tree, parts explosion, route scheduling. The interesting question is whether TC with a narrow capability to perform a computation at each step (which along with aggregation comprise GTC) is enough, or whether there are interesting queries not addressed. There are some I think are beyond GTC:
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
I now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
But there is no argument about that. Logically there are the two steps: expand the path, performing a multiplicative computation along the path; then aggregate the result as desired. That's precisely how it's done with while. If it's a priority to have a single GTC operator, use the 'generic relational operator' capability of the language to build that out of the two parts.
The main purpose of TC, GTC and while is inflation: adding new tuples to a relation according to some rule or computation. This is an inherent property of Datalog, but uncommon in other query languages. I guess a good part of the reason is that apart from a few specific instances, there really isn't much call. I can think of family tree, parts explosion, route scheduling. The interesting question is whether TC with a narrow capability to perform a computation at each step (which along with aggregation comprise GTC) is enough, or whether there are interesting queries not addressed. There are some I think are beyond GTC:
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
Quote from AntC on May 17, 2020, 6:25 amQuote from dandl on May 17, 2020, 1:48 am
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
In my rag-bag of ad-hoc data manipulation tools (chiefly used in ERP migrations), I have a pre-built table with an INT column containing every INT from zero up to a
2 ^ 16
or so. I also have a table with every date from the invention of computers at least up to the probable death of anybody likely to need recording in the database -- that is if the ERP doesn't already have such a thing (most do).
SELECT INT1.INT AS X, INT2.INT AS Y, X * Y AS PRODUCT FROM INT AS INT1, INT AS INT2 WHERE INT1.INT <= 12 AND INT2.INT <= 12 ORDER BY X, Y;
Nearly every requirement can be turned into a query over
INT
and/or overDATE
.
Quote from dandl on May 17, 2020, 1:48 am
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
In my rag-bag of ad-hoc data manipulation tools (chiefly used in ERP migrations), I have a pre-built table with an INT column containing every INT from zero up to a 2 ^ 16
or so. I also have a table with every date from the invention of computers at least up to the probable death of anybody likely to need recording in the database -- that is if the ERP doesn't already have such a thing (most do).
SELECT INT1.INT AS X, INT2.INT AS Y, X * Y AS PRODUCT FROM INT AS INT1, INT AS INT2 WHERE INT1.INT <= 12 AND INT2.INT <= 12 ORDER BY X, Y;
Nearly every requirement can be turned into a query over INT
and/or over DATE
.
Quote from Erwin on May 17, 2020, 9:30 amQuote from dandl on May 17, 2020, 1:48 amI now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
But there is no argument about that. Logically there are the two steps: expand the path, performing a multiplicative computation along the path; then aggregate the result as desired. That's precisely how it's done with while. If it's a priority to have a single GTC operator, use the 'generic relational operator' capability of the language to build that out of the two parts.
The main purpose of TC, GTC and while is inflation: adding new tuples to a relation according to some rule or computation. This is an inherent property of Datalog, but uncommon in other query languages. I guess a good part of the reason is that apart from a few specific instances, there really isn't much call. I can think of family tree, parts explosion, route scheduling. The interesting question is whether TC with a narrow capability to perform a computation at each step (which along with aggregation comprise GTC) is enough, or whether there are interesting queries not addressed. There are some I think are beyond GTC:
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
Two steps is one too much (for getting addressed in a ***single*** operator that is offered by the system as a basic building brick).
Define an operator to do one thing, and do it right.
You also don't claim that JOIN ought to include PROJECT functionality because the user's question necessitates projection and if JOIN doesn't do it then I must add the projection to the query myself, do you ?
Nestability of the operators of the algebra is not the bug, it's the feature. Define an operator to do one thing, and do it right. Anything else leads to a new operator for each new query/user problem. (In fact, each newly written query ***IS***, in a sense, a newly written operator, and no developer in his right mind is ever going to expect from the DBMS provider that the latter has already included all the queries for the former, not ?)
As for your queries :
EXTEND ( UNPACK (RELATION ( BODY ( TUPLE (INTRANGE(INTINTERVAL(FROM(0)TO(65535)))))), ,(INTRANGE)) , (MULTIPLE12(MULT(INTRANGE,12))))RELATION selects a singleton, UNPACK turns that into a 65535-tuple one (in which the interval type is traded for the underlying point type), EXTEND computes the multiples. Nobody has ever said "range generation" can only be solved using GTC or whatever other recursion-based means. That database programmers ***think so*** because it's the only way they can do it ***in SQL*** and that they have gotten it rusted into their brains so deeply that they cannot even imagine anymore that there might be other solutions too, is their problem not mine.
RESTRICT ( TCLOSE ( PAR_CHI , (PARENT,CHILD) ) , EQ ( PARENT , CHILD ) )G not even needed. Where's the problem ?
I don't know the details of Alice's "even" problem and I'm currently not going to go looking. Engine is having open-heart surgery and I want it running as soon as possible so that next time I might be able to show results along the code.
Quote from dandl on May 17, 2020, 1:48 amI now turn to what "my" GTCLOSE can do to help solve the problem. "My" GTCLOSE allows you to compute computable properties that are properties ***of the path itself***. For example, if there is a path from "1" to "5" that contains 4 steps to get there (or that contains 5 "stops" in total) and there is a path from "5" to "7" that contains 2 steps to get there (or that contains 3 "stops" in total), then there is a path from "1" to "7" that contains 4 + 2 steps to get there (or that contains 5 + 3 - 1 "stops" in total).
"My" GTCLOSE, when used to address the kind of problem as exposed by the example of the TTM VSS, requires you to maintain the identity of the path along with its properties. Identity of the path is an ordered list of "stops", of which the relational equivalent is a relation with an "ORDINAL" attribute and a "identity-of-the-stop" attribute.
So "my" GTCLOSE can give you a relation {containing_part, contained_part, ordered_list_of intermediate_containing_parts, total_quantity_of_contained_parts_needed} that would contain TWO tuples :
{P1, P3, {}, 2}
{P1, P3, {P2}, 10}
and if the user is interested in the aggregation of quantities needed regardless of what intermediate "path of necessitation" leads to some particular sub-quantity needed, then I say "then use AGGREGATE to get there".
Can you process this list to get the the final result?
SUMMARIZEBY(<input relation>, (CONTAINING_PART,CONTAINED_PART),(QUANTITY(PLUS(QUANTITY))))
As you say, it's just straightforward aggregation.
Which is why it doesn't belong as an integrated step within the transitive closure operation. imo.
But there is no argument about that. Logically there are the two steps: expand the path, performing a multiplicative computation along the path; then aggregate the result as desired. That's precisely how it's done with while. If it's a priority to have a single GTC operator, use the 'generic relational operator' capability of the language to build that out of the two parts.
The main purpose of TC, GTC and while is inflation: adding new tuples to a relation according to some rule or computation. This is an inherent property of Datalog, but uncommon in other query languages. I guess a good part of the reason is that apart from a few specific instances, there really isn't much call. I can think of family tree, parts explosion, route scheduling. The interesting question is whether TC with a narrow capability to perform a computation at each step (which along with aggregation comprise GTC) is enough, or whether there are interesting queries not addressed. There are some I think are beyond GTC:
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
Two steps is one too much (for getting addressed in a ***single*** operator that is offered by the system as a basic building brick).
Define an operator to do one thing, and do it right.
You also don't claim that JOIN ought to include PROJECT functionality because the user's question necessitates projection and if JOIN doesn't do it then I must add the projection to the query myself, do you ?
Nestability of the operators of the algebra is not the bug, it's the feature. Define an operator to do one thing, and do it right. Anything else leads to a new operator for each new query/user problem. (In fact, each newly written query ***IS***, in a sense, a newly written operator, and no developer in his right mind is ever going to expect from the DBMS provider that the latter has already included all the queries for the former, not ?)
As for your queries :
EXTEND ( UNPACK (RELATION ( BODY ( TUPLE (INTRANGE(INTINTERVAL(FROM(0)TO(65535)))))), ,(INTRANGE)) , (MULTIPLE12(MULT(INTRANGE,12))))
RELATION selects a singleton, UNPACK turns that into a 65535-tuple one (in which the interval type is traded for the underlying point type), EXTEND computes the multiples. Nobody has ever said "range generation" can only be solved using GTC or whatever other recursion-based means. That database programmers ***think so*** because it's the only way they can do it ***in SQL*** and that they have gotten it rusted into their brains so deeply that they cannot even imagine anymore that there might be other solutions too, is their problem not mine.
RESTRICT ( TCLOSE ( PAR_CHI , (PARENT,CHILD) ) , EQ ( PARENT , CHILD ) )
G not even needed. Where's the problem ?
I don't know the details of Alice's "even" problem and I'm currently not going to go looking. Engine is having open-heart surgery and I want it running as soon as possible so that next time I might be able to show results along the code.
Quote from Erwin on May 17, 2020, 9:44 amQuote from AntC on May 17, 2020, 6:25 amQuote from dandl on May 17, 2020, 1:48 am
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
In my rag-bag of ad-hoc data manipulation tools (chiefly used in ERP migrations), I have a pre-built table with an INT column containing every INT from zero up to a
2 ^ 16
or so. I also have a table with every date from the invention of computers at least up to the probable death of anybody likely to need recording in the database -- that is if the ERP doesn't already have such a thing (most do).
SELECT INT1.INT AS X, INT2.INT AS Y, X * Y AS PRODUCT FROM INT AS INT1, INT AS INT2 WHERE INT1.INT <= 12 AND INT2.INT <= 12 ORDER BY X, Y;
Nearly every requirement can be turned into a query over
INT
and/or overDATE
.At the workplace, we have a "table" of every theoretically possible citizen id mapped to its anonymized counterparts used for communicating to various outside parties. It's a relation in the relational sense of the word, but not a database table. We do the JOINs using good old COBOL flatfile merge programs. Actually it's PL/1, but COBOL gives a so much better impression of what the techniques are like.
Quote from AntC on May 17, 2020, 6:25 amQuote from dandl on May 17, 2020, 1:48 am
- Generate a 12 times multiplication table (or a multiplication table for N)
- Given a family tree, check whether there are any loops
- The Alice even query
I have found the ability to generate sequences surprisingly useful. Since while is enough and easy to implement, I looked no further.
In my rag-bag of ad-hoc data manipulation tools (chiefly used in ERP migrations), I have a pre-built table with an INT column containing every INT from zero up to a
2 ^ 16
or so. I also have a table with every date from the invention of computers at least up to the probable death of anybody likely to need recording in the database -- that is if the ERP doesn't already have such a thing (most do).
SELECT INT1.INT AS X, INT2.INT AS Y, X * Y AS PRODUCT FROM INT AS INT1, INT AS INT2 WHERE INT1.INT <= 12 AND INT2.INT <= 12 ORDER BY X, Y;
Nearly every requirement can be turned into a query over
INT
and/or overDATE
.
At the workplace, we have a "table" of every theoretically possible citizen id mapped to its anonymized counterparts used for communicating to various outside parties. It's a relation in the relational sense of the word, but not a database table. We do the JOINs using good old COBOL flatfile merge programs. Actually it's PL/1, but COBOL gives a so much better impression of what the techniques are like.