Heading inference, not type inference
Quote from dandl on May 21, 2020, 1:48 amMy work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
Proposition:
- There is exactly one relation type
- Therefore there is no relation type inference, and no relation type errors
- Each RT is associated with a heading
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
My work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
Proposition:
- There is exactly one relation type
- Therefore there is no relation type inference, and no relation type errors
- Each RT is associated with a heading
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
Quote from AntC on May 21, 2020, 5:16 amQuote from dandl on May 21, 2020, 1:48 amMy work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
TTM does not claim a 'Heading' corresponds/is mapped to a type.
REL{ }
it calls a 'type generator', hand-waving past what type(s) it generates.You seem to be doing here something even more hand-wavy and vague. How do you define 'type' as in "lots of extra types"? Is that definition specific to what C# calls a type? And as opposed to what C# calls a template/generic? What do other languages mean by 'type'?
If we do equate 'different Heading' with 'different type', we'd expect there to be plenty of types. I've no idea what you mean by "extra": a program needs at least as many types as it has values of different type (disjoint sets of values/values that are incomparable because different type). Valid (sets of) tuples for
S
are disjoint from valid (sets of) tuples forP
. Then they might as well be different type; I don't see that's making "extra" types.
Proposition:
- There is exactly one relation type
Doesn't make sense: you can't compare relations that are values for
S
with those forP
norSP
. The S&P database therefore has at least three distinct relation types. You can project away arbitrary attributes from those, so that's 23 = 8 distinct types produced fromSP
alone.
- Therefore there is no relation type inference, and no relation type errors
Therefore you don't seem to know what you're talking about: of course relational operators give us rules for inferring the types of their relational results from their relational operands. Appendix A FORMAL DEFINITIONS in setbuilder gives them.
- Each RT is associated with a heading
You've just said there's exactly one relation type. What then is the denotation of T in "Each RT"? This would appear to be gibberish.
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
I agree the VSS for generic relational operators is rather vague. Each relational operator in Tutorial D is defined by equivalences in Appendix A to consume one or two relation type(s) and emit another (not necessarily different). That set of operators is relationally complete. Therefore any other relational operator should be definable in terms of them. Merely, Tutorial D doesn't provide a way to define generic relational operators.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
There's plenty of possible errors: Extend with an attribute that's already there; Rename from an attribute that's not there/to an attribute that's already there;
GROUP
where the introduced attribute is already there/the attributes to group aren't, ...UNION/INTERSECT
with different Headings;TIMES
with non-disjoint Headings; ...What are you even talking about? I can't make any sense of any of this.
Quote from dandl on May 21, 2020, 1:48 amMy work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
TTM does not claim a 'Heading' corresponds/is mapped to a type. REL{ }
it calls a 'type generator', hand-waving past what type(s) it generates.
You seem to be doing here something even more hand-wavy and vague. How do you define 'type' as in "lots of extra types"? Is that definition specific to what C# calls a type? And as opposed to what C# calls a template/generic? What do other languages mean by 'type'?
If we do equate 'different Heading' with 'different type', we'd expect there to be plenty of types. I've no idea what you mean by "extra": a program needs at least as many types as it has values of different type (disjoint sets of values/values that are incomparable because different type). Valid (sets of) tuples for S
are disjoint from valid (sets of) tuples for P
. Then they might as well be different type; I don't see that's making "extra" types.
Proposition:
- There is exactly one relation type
Doesn't make sense: you can't compare relations that are values for S
with those for P
nor SP
. The S&P database therefore has at least three distinct relation types. You can project away arbitrary attributes from those, so that's 23 = 8 distinct types produced from SP
alone.
- Therefore there is no relation type inference, and no relation type errors
Therefore you don't seem to know what you're talking about: of course relational operators give us rules for inferring the types of their relational results from their relational operands. Appendix A FORMAL DEFINITIONS in setbuilder gives them.
- Each RT is associated with a heading
You've just said there's exactly one relation type. What then is the denotation of T in "Each RT"? This would appear to be gibberish.
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
I agree the VSS for generic relational operators is rather vague. Each relational operator in Tutorial D is defined by equivalences in Appendix A to consume one or two relation type(s) and emit another (not necessarily different). That set of operators is relationally complete. Therefore any other relational operator should be definable in terms of them. Merely, Tutorial D doesn't provide a way to define generic relational operators.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
There's plenty of possible errors: Extend with an attribute that's already there; Rename from an attribute that's not there/to an attribute that's already there; GROUP
where the introduced attribute is already there/the attributes to group aren't, ... UNION/INTERSECT
with different Headings; TIMES
with non-disjoint Headings; ...
What are you even talking about? I can't make any sense of any of this.
Quote from dandl on May 21, 2020, 11:25 amQuote from AntC on May 21, 2020, 5:16 amQuote from dandl on May 21, 2020, 1:48 amMy work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
TTM does not claim a 'Heading' corresponds/is mapped to a type.
REL{ }
it calls a 'type generator', hand-waving past what type(s) it generates.You seem to be doing here something even more hand-wavy and vague. How do you define 'type' as in "lots of extra types"? Is that definition specific to what C# calls a type? And as opposed to what C# calls a template/generic? What do other languages mean by 'type'?
If we do equate 'different Heading' with 'different type', we'd expect there to be plenty of types. I've no idea what you mean by "extra": a program needs at least as many types as it has values of different type (disjoint sets of values/values that are incomparable because different type). Valid (sets of) tuples for
S
are disjoint from valid (sets of) tuples forP
. Then they might as well be different type; I don't see that's making "extra" types.That's the nub of the problem. TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types. If this is so then each RA operation amounts to an implied type conversion (from one heading to another). Every user-defined variable, value, argument, etc is of one specific type (aka heading), and many reasonable data manipulations will require to be accompanied by a series of type conversions.
I'm exploring what happens if there is only a single relation type, and the only type error is an attribute missing or mismatched.
Proposition:
- There is exactly one relation type
Doesn't make sense: you can't compare relations that are values for
S
with those forP
norSP
. The S&P database therefore has at least three distinct relation types. You can project away arbitrary attributes from those, so that's 23 = 8 distinct types produced fromSP
alone.Values with different headings are not equal.
- Therefore there is no relation type inference, and no relation type errors
Therefore you don't seem to know what you're talking about: of course relational operators give us rules for inferring the types of their relational results from their relational operands. Appendix A FORMAL DEFINITIONS in setbuilder gives them.
Read on. That's covered by heading inference.
- Each RT is associated with a heading
You've just said there's exactly one relation type. What then is the denotation of T in "Each RT"? This would appear to be gibberish.
Sorry, each relation value has a heading (see RM Pre 10). The type is always 'relation'.
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
I agree the VSS for generic relational operators is rather vague. Each relational operator in Tutorial D is defined by equivalences in Appendix A to consume one or two relation type(s) and emit another (not necessarily different). That set of operators is relationally complete. Therefore any other relational operator should be definable in terms of them. Merely, Tutorial D doesn't provide a way to define generic relational operators.
The coverage is reasonable, but it has some shorthands and some gaps. INSERT, DELETE and UPDATE are shorthands, but you cannot defined them in TD using assignment as a primitive. TD does not have a TRANSFORM shorthand, and you cannot define one. It does not have GTC or while, and you cannot add them. We've discussed this here, and it turns out to be rather hard. If relation as a type is generic, I think that gets easier.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
There's plenty of possible errors: Extend with an attribute that's already there; Rename from an attribute that's not there/to an attribute that's already there;
GROUP
where the introduced attribute is already there/the attributes to group aren't, ...UNION/INTERSECT
with different Headings;TIMES
with non-disjoint Headings; ...What are you even talking about? I can't make any sense of any of this.
I would argue that treating these as type errors is a decision made by the authors of TD. The fact is that in each case the error is visible by inspecting the relevant headings, and mostly they fall into the same two categories: trying to access an undefined attribute, or two headings with conflicting attributes.
I don't reject the idea that a compiler does heading inference, in which case it will pick up the last two cases, but even that is a choice. Generalised Union is a plausible alternative, simply project each argument onto the common attributes. And TIMES can be just an alias for a join.
The point is this is just a shift in emphasis, from one generated type per heading to one generic type plus a heading. The first relies on type inference, the second on heading inference. The first treats non-scalar types as islands with guarded bridges joining them, the second treats them as shared living, with easy movement between them.
Quote from AntC on May 21, 2020, 5:16 amQuote from dandl on May 21, 2020, 1:48 amMy work on 'C# as D' shows that treating each tuple/relation with a different heading as a different type leads to the need to create lots of extra types, to no real purpose. TTM aims do not seem to demand this.
TTM does not claim a 'Heading' corresponds/is mapped to a type.
REL{ }
it calls a 'type generator', hand-waving past what type(s) it generates.You seem to be doing here something even more hand-wavy and vague. How do you define 'type' as in "lots of extra types"? Is that definition specific to what C# calls a type? And as opposed to what C# calls a template/generic? What do other languages mean by 'type'?
If we do equate 'different Heading' with 'different type', we'd expect there to be plenty of types. I've no idea what you mean by "extra": a program needs at least as many types as it has values of different type (disjoint sets of values/values that are incomparable because different type). Valid (sets of) tuples for
S
are disjoint from valid (sets of) tuples forP
. Then they might as well be different type; I don't see that's making "extra" types.
That's the nub of the problem. TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types. If this is so then each RA operation amounts to an implied type conversion (from one heading to another). Every user-defined variable, value, argument, etc is of one specific type (aka heading), and many reasonable data manipulations will require to be accompanied by a series of type conversions.
I'm exploring what happens if there is only a single relation type, and the only type error is an attribute missing or mismatched.
Proposition:
- There is exactly one relation type
Doesn't make sense: you can't compare relations that are values for
S
with those forP
norSP
. The S&P database therefore has at least three distinct relation types. You can project away arbitrary attributes from those, so that's 23 = 8 distinct types produced fromSP
alone.
Values with different headings are not equal.
- Therefore there is no relation type inference, and no relation type errors
Therefore you don't seem to know what you're talking about: of course relational operators give us rules for inferring the types of their relational results from their relational operands. Appendix A FORMAL DEFINITIONS in setbuilder gives them.
Read on. That's covered by heading inference.
- Each RT is associated with a heading
You've just said there's exactly one relation type. What then is the denotation of T in "Each RT"? This would appear to be gibberish.
Sorry, each relation value has a heading (see RM Pre 10). The type is always 'relation'.
- Generic operators perform heading inference
- Otherwise, RT values are projected onto a heading as needed
- The above should be read as applying equally to tuple types.
Note that attribute type errors will occur with:
- Projection onto an unknown attribute
- Two attributes of the same name and different types
- Faulty attribute references in operators (such as WHERE and EXTEND in TD) (*)
Note (*) this category is not intrinsic to the RA but arises because of the particular syntactic form of TD. It can be avoided by using function relations, as per App-A.
Why? TTM seems to require that each RT be unique and incompatible, and that somehow the operators of the RA are able to consume one and emit another, and that the compiler be able to infer and create such types. Then it asks for generic operators as something separate.
I agree the VSS for generic relational operators is rather vague. Each relational operator in Tutorial D is defined by equivalences in Appendix A to consume one or two relation type(s) and emit another (not necessarily different). That set of operators is relationally complete. Therefore any other relational operator should be definable in terms of them. Merely, Tutorial D doesn't provide a way to define generic relational operators.
The coverage is reasonable, but it has some shorthands and some gaps. INSERT, DELETE and UPDATE are shorthands, but you cannot defined them in TD using assignment as a primitive. TD does not have a TRANSFORM shorthand, and you cannot define one. It does not have GTC or while, and you cannot add them. We've discussed this here, and it turns out to be rather hard. If relation as a type is generic, I think that gets easier.
If instead the RT is taken to be just a single type the compiler only needs to track headings and there is really only one possible error: projection onto a missing or mismatched attribute. And you get a large measure of generic operators for free. And my 'C# as D' looks like it might be a tad easier.
There's plenty of possible errors: Extend with an attribute that's already there; Rename from an attribute that's not there/to an attribute that's already there;
GROUP
where the introduced attribute is already there/the attributes to group aren't, ...UNION/INTERSECT
with different Headings;TIMES
with non-disjoint Headings; ...What are you even talking about? I can't make any sense of any of this.
I would argue that treating these as type errors is a decision made by the authors of TD. The fact is that in each case the error is visible by inspecting the relevant headings, and mostly they fall into the same two categories: trying to access an undefined attribute, or two headings with conflicting attributes.
I don't reject the idea that a compiler does heading inference, in which case it will pick up the last two cases, but even that is a choice. Generalised Union is a plausible alternative, simply project each argument onto the common attributes. And TIMES can be just an alias for a join.
The point is this is just a shift in emphasis, from one generated type per heading to one generic type plus a heading. The first relies on type inference, the second on heading inference. The first treats non-scalar types as islands with guarded bridges joining them, the second treats them as shared living, with easy movement between them.
Quote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
Hugh
I respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
Hugh
Quote from dandl on May 22, 2020, 3:32 pmQuote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
My intention in using the term was to refer to the type that is 'generated' when a user explicitly declares a variable or value and specifies a full heading. The type does not have a user-supplied name, but the details (heading attribute names and types) are all user-defined. If there is a better term, I'll be happy to use it.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
That seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? Say at point (x) in the following:
SP JOIN S (x) JOIN P
In a sense though, that is consistent with what I'm proposing. For most purposes it's enough to say: 'this is a relational type with heading H', and all needed inference can be performed using just the heading. The only real type errors that can arise are from an assignment or operator invocation with a mismatched or missing attribute. [Yes, I know there are rules around heading compatibility for various relational operators, but there are also choices how to deal with them, short of throwing a type error).
But perhaps I misunderstand what you mean by type generator, and some explanation may assist.
Hugh
Quote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
Hugh
Quote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
My intention in using the term was to refer to the type that is 'generated' when a user explicitly declares a variable or value and specifies a full heading. The type does not have a user-supplied name, but the details (heading attribute names and types) are all user-defined. If there is a better term, I'll be happy to use it.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
That seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? Say at point (x) in the following:
SP JOIN S (x) JOIN P
In a sense though, that is consistent with what I'm proposing. For most purposes it's enough to say: 'this is a relational type with heading H', and all needed inference can be performed using just the heading. The only real type errors that can arise are from an assignment or operator invocation with a mismatched or missing attribute. [Yes, I know there are rules around heading compatibility for various relational operators, but there are also choices how to deal with them, short of throwing a type error).
But perhaps I misunderstand what you mean by type generator, and some explanation may assist.
Hugh
Quote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
Hugh
Quote from Erwin on May 22, 2020, 9:02 pmQuote from dandl on May 22, 2020, 3:32 pmThat seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
Earlier in this thread, you stated you were exploring "what was going to happen" if you relaxed all those rigorous rigid prescriptions so that the "prescriptions" got to be more of a fit with what you're able to accomplish (within the timeframe that's available to you for getting stuff done). Well, the foregoing story is a real-life example of what happens if you do that. Learn from other people's mistakes. Life is too short to make them all yourself.
Quote from dandl on May 22, 2020, 3:32 pm
That seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
Earlier in this thread, you stated you were exploring "what was going to happen" if you relaxed all those rigorous rigid prescriptions so that the "prescriptions" got to be more of a fit with what you're able to accomplish (within the timeframe that's available to you for getting stuff done). Well, the foregoing story is a real-life example of what happens if you do that. Learn from other people's mistakes. Life is too short to make them all yourself.
Quote from Dave Voorhis on May 22, 2020, 9:36 pmQuote from dandl on May 22, 2020, 3:32 pmQuote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
My intention in using the term was to refer to the type that is 'generated' when a user explicitly declares a variable or value and specifies a full heading. The type does not have a user-supplied name, but the details (heading attribute names and types) are all user-defined. If there is a better term, I'll be happy to use it.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
That seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? Say at point (x) in the following:
SP JOIN S (x) JOIN P
I'm not sure how you'd infer "typeless" at that point. Evaluating expressions consisting of operator invocations is always described in terms of operator operand and return types, so SP JOIN S determines an intermediate type which is JOINed with P to obtain the type of the expression.
In other words, (the type of) every expression can be described entirely in terms of (the types of) its subexpressions.
Quote from dandl on May 22, 2020, 3:32 pmQuote from Hugh on May 22, 2020, 11:23 amI respond to this from dandl.
"TTM does not make it clear whether a type needs to be 'generated' for each intermediate result in evaluating an RA expression, or what the consequences might be if those invisible generated types happen to conflict with user defined tuple types."
First, TTM does not support user-defined tuple types or relation types. Attempt such support at your peril!.
My intention in using the term was to refer to the type that is 'generated' when a user explicitly declares a variable or value and specifies a full heading. The type does not have a user-supplied name, but the details (heading attribute names and types) are all user-defined. If there is a better term, I'll be happy to use it.
But anyway, I sense a confusion between type generation and type inference.
A generated type is generated by an invocation of a type generator. E.g., RELATION{i INT, j INT}, TUPLE{x CHAR, y CHAR}. Such invocations are used in definitions of variables, operators, etc.
The type of an expression denoting a relational operator invocation is inferred from the type(s) of the operand expressions(s) and the semantics of the operator in question. This does not involve type generation.
That seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? Say at point (x) in the following:
SP JOIN S (x) JOIN P
I'm not sure how you'd infer "typeless" at that point. Evaluating expressions consisting of operator invocations is always described in terms of operator operand and return types, so SP JOIN S determines an intermediate type which is JOINed with P to obtain the type of the expression.
In other words, (the type of) every expression can be described entirely in terms of (the types of) its subexpressions.
Quote from dandl on May 23, 2020, 1:20 amQuote from Erwin on May 22, 2020, 9:02 pmQuote from dandl on May 22, 2020, 3:32 pmThat seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
That's a lot of words for two rather small points. Yes, of course GROUP has to be type-checked with exactly the same stringency as any other relational value. Andl always did that, and I'm surprised you left it out.
As far as the point I am making in this thread, a grouped attribute value (such as
PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.Earlier in this thread, you stated you were exploring "what was going to happen" if you relaxed all those rigorous rigid prescriptions so that the "prescriptions" got to be more of a fit with what you're able to accomplish (within the timeframe that's available to you for getting stuff done). Well, the foregoing story is a real-life example of what happens if you do that. Learn from other people's mistakes. Life is too short to make them all yourself.
One of the wonderful things about science is that it records and makes available the mistakes made by others, precisely so that you can quickly and efficiently replicate those mistakes in the search for something better. The tradesman repeats known ways, but the scientist finds new ways. Unfortunately your anecdote does not provide the information required to replicate this particular error, so it's hard to use it scientifically.
Quote from Erwin on May 22, 2020, 9:02 pmQuote from dandl on May 22, 2020, 3:32 pmThat seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
That's a lot of words for two rather small points. Yes, of course GROUP has to be type-checked with exactly the same stringency as any other relational value. Andl always did that, and I'm surprised you left it out.
As far as the point I am making in this thread, a grouped attribute value (such as PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.
Earlier in this thread, you stated you were exploring "what was going to happen" if you relaxed all those rigorous rigid prescriptions so that the "prescriptions" got to be more of a fit with what you're able to accomplish (within the timeframe that's available to you for getting stuff done). Well, the foregoing story is a real-life example of what happens if you do that. Learn from other people's mistakes. Life is too short to make them all yourself.
One of the wonderful things about science is that it records and makes available the mistakes made by others, precisely so that you can quickly and efficiently replicate those mistakes in the search for something better. The tradesman repeats known ways, but the scientist finds new ways. Unfortunately your anecdote does not provide the information required to replicate this particular error, so it's hard to use it scientifically.
Quote from AntC on May 23, 2020, 11:31 amQuote from dandl on May 23, 2020, 1:20 amQuote from Erwin on May 22, 2020, 9:02 pmQuote from dandl on May 22, 2020, 3:32 pmThat seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
That's a lot of words for two rather small points. Yes, of course GROUP has to be type-checked with exactly the same stringency as any other relational value. Andl always did that, and I'm surprised you left it out.
As far as the point I am making in this thread, a grouped attribute value (such as
PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.I'm continuing to not see what's at issue here. "Heading inference, not type inference". This seems like a distinction without a difference. If the compiler is inferring the Heading for all intermediate sub-terms within an expression; what is it not doing that means it's not inferring the type?
What's "unique" doing in that paragraph above?
PQ
on p23 is an RVA. The attribute type is therefore some relation type. That type might or not be unique within some D program. So what? The type must get inferred for the invocation of theGROUP
operator (or similar) that produces the RVA.Erwin's point seems quite clear and valid to me: in a language that supports arbitrary nesting of operator invocations, type inference for intermediate sub-terms must be capable of recursing all the way down those invocations.
If the language features polymorphism, overloading, container types of arbitrary polymorphic types (like relations over arbitrary Headings), then type inference (or even type-checking) becomes a very hard problem. Descriptions of sophisticated type systems these days come with machine-checked proofs of the coherence, confluence and time-complexity for inference.
[amateur Popperianism left out]
Quote from dandl on May 23, 2020, 1:20 amQuote from Erwin on May 22, 2020, 9:02 pmQuote from dandl on May 22, 2020, 3:32 pmThat seems curious. Do you intend that intermediate values arising in the evaluation of a relational expression should be regarded as typeless? :
No. All values are of a type and if the compiler (and its type-safety-ensuring-component in particular) must verify something, anything, about any possible value that might pop up in "intermediate results", then [if that value happens to be nonscalar] the type ***MUST*** be "generated". That "generating" is ***NECESSARY*** for the compiler to have ***ANY UNDERSTANDING*** of what is going on, type-safety-wise. (That "generating" does ***NOT*** involve anything more than computing the headings, computation which you will also need if you want your "heading inference" to be applied in due time within the process, so the issue you seem to have is imo not almost but more than certainly a non-one.)
I'll recant the story of a user who had devised a really horrible nesting of RA expressions that involved, (A.O. !!!) 7 (SEVEN) invocations of GROUP. Now at that time, SIRA_PRISE didn't perform heading checks on GROUP expressions the way TTM and its authors would have ordered me to, rather it did almost ***EXACTLY*** what you appear to be proposing here : "hey, if it's a relation that comes out it'll probably be OK so I'll let it pass. All that nifty heading compatibility checking is just too much work right here right now so I'll let it pass", or "hey, it's only an intermediate result here, I'm not going to do all that difficult and expensive type generation stuff only to aid the user in the "extremely rare" case where he got his business wrong - after all, if ***HE*** gets something wrong, ***HE*** still has a debugger". But that user made a mistake somewhere right in the middle of those seven invocations. Took me a ***WHOLE WEEK*** to pin it down. (And that's adding up to the time the user had himself already invested in trying to figure out what was wrong.)
It's always the same. If you stretch the expressive power of a system to completely unexplored levels such as TTM's then you think things like "hey, this GROUP thing really seems like a powerful computational device and users will certainly be impressed with what they can achieve with it even if they need to use it ***only once*** in any expression and they'll certainly appreciate the risks of working with things that powerful and that complex, so they will certainly be VERY cautious and it will be very rare to see a user use more than one instance of this in any query". Yeah. SEVEN. And the TTM authors had told me I should have gotten the type checking right before publishing.
That's a lot of words for two rather small points. Yes, of course GROUP has to be type-checked with exactly the same stringency as any other relational value. Andl always did that, and I'm surprised you left it out.
As far as the point I am making in this thread, a grouped attribute value (such as
PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.
I'm continuing to not see what's at issue here. "Heading inference, not type inference". This seems like a distinction without a difference. If the compiler is inferring the Heading for all intermediate sub-terms within an expression; what is it not doing that means it's not inferring the type?
What's "unique" doing in that paragraph above? PQ
on p23 is an RVA. The attribute type is therefore some relation type. That type might or not be unique within some D program. So what? The type must get inferred for the invocation of the GROUP
operator (or similar) that produces the RVA.
Erwin's point seems quite clear and valid to me: in a language that supports arbitrary nesting of operator invocations, type inference for intermediate sub-terms must be capable of recursing all the way down those invocations.
If the language features polymorphism, overloading, container types of arbitrary polymorphic types (like relations over arbitrary Headings), then type inference (or even type-checking) becomes a very hard problem. Descriptions of sophisticated type systems these days come with machine-checked proofs of the coherence, confluence and time-complexity for inference.
[amateur Popperianism left out]
Quote from dandl on May 24, 2020, 2:13 amAs far as the point I am making in this thread, a grouped attribute value (such as
PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.I'm continuing to not see what's at issue here. "Heading inference, not type inference". This seems like a distinction without a difference. If the compiler is inferring the Heading for all intermediate sub-terms within an expression; what is it not doing that means it's not inferring the type?
In a sense I agree. My proposal is to consider separately the conventional type checking used for the scalar types from the rather unconventional heading inference of the RA operators (which, BTW are never spelled out in TTM or DTATRM, except by reference to TD). You might want to call it a type parameter or type specialisation or polymorphism or something else, it's just different.
What type system terminology do you use to describe C++/C#/Java OO? What do you use for templates/generics? The TTM headings and their relationship to the type system are as different as those are from the simple primitives they all share.
What's "unique" doing in that paragraph above?
PQ
on p23 is an RVA. The attribute type is therefore some relation type. That type might or not be unique within some D program. So what? The type must get inferred for the invocation of theGROUP
operator (or similar) that produces the RVA.Of course. Except I would describe it as a relation type with a heading that must be inferred.
Erwin's point seems quite clear and valid to me: in a language that supports arbitrary nesting of operator invocations, type inference for intermediate sub-terms must be capable of recursing all the way down those invocations.
Ditto heading inference.
If the language features polymorphism, overloading, container types of arbitrary polymorphic types (like relations over arbitrary Headings), then type inference (or even type-checking) becomes a very hard problem. Descriptions of sophisticated type systems these days come with machine-checked proofs of the coherence, confluence and time-complexity for inference.
Heading inference is trivial. Why complicate?
As far as the point I am making in this thread, a grouped attribute value (such as
PQ
on p23 of DTATRM) is an argument to a selector operator and since both value and selector specify a heading, those headings must be checked (recursively in this case). But it is not necessary to generate a unique type in order to do that. I'm proposing to distinguish the concepts of (strict) type safety from (rule-based) heading compatibility.I'm continuing to not see what's at issue here. "Heading inference, not type inference". This seems like a distinction without a difference. If the compiler is inferring the Heading for all intermediate sub-terms within an expression; what is it not doing that means it's not inferring the type?
In a sense I agree. My proposal is to consider separately the conventional type checking used for the scalar types from the rather unconventional heading inference of the RA operators (which, BTW are never spelled out in TTM or DTATRM, except by reference to TD). You might want to call it a type parameter or type specialisation or polymorphism or something else, it's just different.
What type system terminology do you use to describe C++/C#/Java OO? What do you use for templates/generics? The TTM headings and their relationship to the type system are as different as those are from the simple primitives they all share.
What's "unique" doing in that paragraph above?
PQ
on p23 is an RVA. The attribute type is therefore some relation type. That type might or not be unique within some D program. So what? The type must get inferred for the invocation of theGROUP
operator (or similar) that produces the RVA.
Of course. Except I would describe it as a relation type with a heading that must be inferred.
Erwin's point seems quite clear and valid to me: in a language that supports arbitrary nesting of operator invocations, type inference for intermediate sub-terms must be capable of recursing all the way down those invocations.
Ditto heading inference.
If the language features polymorphism, overloading, container types of arbitrary polymorphic types (like relations over arbitrary Headings), then type inference (or even type-checking) becomes a very hard problem. Descriptions of sophisticated type systems these days come with machine-checked proofs of the coherence, confluence and time-complexity for inference.
Heading inference is trivial. Why complicate?