The case for an RA based on generics instead of types
Quote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value. Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Fully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value. Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.
I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Quote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
Do you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
Quote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.The number of Operators in your list seems excessive. Can't I use a small core of operators to combine into that longer list? Specifically, if the core are polymorphic enough operators, it should be possible to combine them to form polymorphic derived operators. And if you can't do that, how can you be sure you can use those operators to produce arbitrarily complex relational expressions?
For example, why is there both
join( ), intersect( )
? Isn'tintersect( )
just a type-specialisation ofjoin( )
? How do you express that the arguments tointersect( )
must have same heading?
Quote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead. - In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument). - If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.
The number of Operators in your list seems excessive. Can't I use a small core of operators to combine into that longer list? Specifically, if the core are polymorphic enough operators, it should be possible to combine them to form polymorphic derived operators. And if you can't do that, how can you be sure you can use those operators to produce arbitrarily complex relational expressions?
For example, why is there both join( ), intersect( )
? Isn't intersect( )
just a type-specialisation of join( )
? How do you express that the arguments to intersect( )
must have same heading?
Quote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler. Since this is a pre-processor, it can output compiled code following ordinary POCO/POJO conventions, and the output will be seen by the host as a stream of records in a LINQ/streams idiom.
The point is I've already done virtually all of this is Andl, but with the added burden of defining an entire language and type system. I'm arguing a case for resolving the type system problem in TTM/D by targeting a templating system instead of a type system. C++ does not have a type called 'vector of int' or 'dictionary of string, string'. The magic insight of the STL was to define these things as templates and specialise them by type as needed. I think the same concept works here, templates and specialisation instead of the type generators in TTM.
Quote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.
Yes, if it was compiled into a GP language, there would be some internal data type representing r-value
, but it is not exposed. The only external connection is through pseudo-variables.
Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler. Since this is a pre-processor, it can output compiled code following ordinary POCO/POJO conventions, and the output will be seen by the host as a stream of records in a LINQ/streams idiom.
The point is I've already done virtually all of this is Andl, but with the added burden of defining an entire language and type system. I'm arguing a case for resolving the type system problem in TTM/D by targeting a templating system instead of a type system. C++ does not have a type called 'vector of int' or 'dictionary of string, string'. The magic insight of the STL was to define these things as templates and specialise them by type as needed. I think the same concept works here, templates and specialisation instead of the type generators in TTM.
Quote from dandl on April 8, 2020, 2:34 amQuote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.No. The library functions operate at the attribute type level, typically scalars. So there needs to be a
real rho(real x, real y)
function and areal theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
The number of Operators in your list seems excessive. Can't I use a small core of operators to combine into that longer list? Specifically, if the core are polymorphic enough operators, it should be possible to combine them to form polymorphic derived operators. And if you can't do that, how can you be sure you can use those operators to produce arbitrarily complex relational expressions?
I intentionally included several redundant functions, because there is no reason not to. Designing a 'meta-language' in which to define these operators would be a separate issue, at this stage they are all hard-coded.
For example, why is there both
join( ), intersect( )
? Isn'tintersect( )
just a type-specialisation ofjoin( )
? How do you express that the arguments tointersect( )
must have same heading?Same answer. All the UNION derivatives are known by the compiler to require relation value arguments with the same heading, and to produce a return value with that same heading, exactly as Rel and Andl do right now.
Edit: it might be worth adding others, such as
remove()
orrename()
for multiple attributes at once. I'm not purist about this, but the syntax could get awkward.
Quote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.
No. The library functions operate at the attribute type level, typically scalars. So there needs to be areal rho(real x, real y)
function and a real theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).
The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.
Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
The number of Operators in your list seems excessive. Can't I use a small core of operators to combine into that longer list? Specifically, if the core are polymorphic enough operators, it should be possible to combine them to form polymorphic derived operators. And if you can't do that, how can you be sure you can use those operators to produce arbitrarily complex relational expressions?
I intentionally included several redundant functions, because there is no reason not to. Designing a 'meta-language' in which to define these operators would be a separate issue, at this stage they are all hard-coded.
For example, why is there both
join( ), intersect( )
? Isn'tintersect( )
just a type-specialisation ofjoin( )
? How do you express that the arguments tointersect( )
must have same heading?
Same answer. All the UNION derivatives are known by the compiler to require relation value arguments with the same heading, and to produce a return value with that same heading, exactly as Rel and Andl do right now.
Edit: it might be worth adding others, such as remove()
or rename()
for multiple attributes at once. I'm not purist about this, but the syntax could get awkward.
Quote from Dave Voorhis on April 8, 2020, 6:59 amQuote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
Quote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
Quote from AntC on April 8, 2020, 8:22 amQuote from dandl on April 8, 2020, 2:34 amQuote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
That's not a type system then. I'm out.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
There's a mess of confusion there. ad-hoc polymorphism aka overloading is (or rather can be, depending on your language's type system) precisely statically type-safe and able to support separate compilation.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
Ah, sorry didn't spot this before:
r-value
is already a term of art, vsl-value
, dating back to Strachey 1967, and as used in CPL/BCPL/C/C++. Can you find a different term?This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
That contradicts your earlier remarks, and turns out to be untrue, see below.
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.No. The library functions operate at the attribute type level, typically scalars. So there needs to be a
real rho(real x, real y)
function and areal theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
Polymorphism must survive separate compilation: the particular overload needed might not appear in this module, the the compiler can't see it. (That's another problem with templating: it tends to need monolithic program/module structures.)
Quote from dandl on April 8, 2020, 2:34 amQuote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
That's not a type system then. I'm out.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
There's a mess of confusion there. ad-hoc polymorphism aka overloading is (or rather can be, depending on your language's type system) precisely statically type-safe and able to support separate compilation.
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
Ah, sorry didn't spot this before: r-value
is already a term of art, vs l-value
, dating back to Strachey 1967, and as used in CPL/BCPL/C/C++. Can you find a different term?
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
That contradicts your earlier remarks, and turns out to be untrue, see below.
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.No. The library functions operate at the attribute type level, typically scalars. So there needs to be a
real rho(real x, real y)
function and areal theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
Polymorphism must survive separate compilation: the particular overload needed might not appear in this module, the the compiler can't see it. (That's another problem with templating: it tends to need monolithic program/module structures.)
Quote from dandl on April 8, 2020, 9:13 amQuote from Dave Voorhis on April 8, 2020, 6:59 amQuote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
I get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
Quote from Dave Voorhis on April 8, 2020, 6:59 amQuote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
I get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
Quote from dandl on April 8, 2020, 9:27 amQuote from AntC on April 8, 2020, 8:22 amQuote from dandl on April 8, 2020, 2:34 amQuote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
That's not a type system then. I'm out.
Pick any of the top 10 languages that people find perfectly serviceable for a wide range of GP programming tasks. Now tell me how you would reconcile their type systems.
What I describe is practical, effective and achievable.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
There's a mess of confusion there. ad-hoc polymorphism aka overloading is (or rather can be, depending on your language's type system) precisely statically type-safe and able to support separate compilation.
Is that a question?
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
Ah, sorry didn't spot this before:
r-value
is already a term of art, vsl-value
, dating back to Strachey 1967, and as used in CPL/BCPL/C/C++. Can you find a different term?You're right. It just slipped out. Will do.
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
That contradicts your earlier remarks, and turns out to be untrue, see below.
Why?
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.No. The library functions operate at the attribute type level, typically scalars. So there needs to be a
real rho(real x, real y)
function and areal theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
Polymorphism must survive separate compilation: the particular overload needed might not appear in this module, the the compiler can't see it. (That's another problem with templating: it tends to need monolithic program/module structures.)
You lost me. The specific intention is to provide a library of type signatures, and for the compiler to resolve function references based on actual types. Can you give me a for instance of what you describe?
Quote from AntC on April 8, 2020, 8:22 amQuote from dandl on April 8, 2020, 2:34 amQuote from AntC on April 8, 2020, 12:39 amQuote from dandl on April 7, 2020, 2:20 pmFully half of TTM is devoted to devising a type system. Unfortunately the result is one that is very hard to reconcile with other languages. In point of fact, the need for designing a whole new language has passed; it would be better to find a way to embed key features, including the RA of RM Pre 18, into a regular language with a familiar type system. An approach based on templates/generics can do that.
The problem with templates (in nearly every language, including C++) is that they can generate invalid/ill-typed code. Generics, in some languages, can also generate ill-typed code; or they need dynamic typing, which can fail at run time -- which we might as well categorise as ill-typed code.
Whether we need "a whole new language" is beside the point. We need a whole new way to do type signatures and type inference for the non-scalar types part of a DML.
That's indeed the point. Type signatures, yes, but with non-scalars akin to templated collections, not types in their own right.
That's not a type system then. I'm out.
Pick any of the top 10 languages that people find perfectly serviceable for a wide range of GP programming tasks. Now tell me how you would reconcile their type systems.
What I describe is practical, effective and achievable.
The rationale is precisely the same as drove the STL in C++. The operators of the RA are a set of generic algorithms in the form of functions which have to be specialised to apply to specific relation arguments and return value.
I'd say: the operators of an RA are polymorphic (ad-hoc/overloaded functions): they take operands with partially-known type signatures (or with a partially-known correspondence between the operands); and return a result with a partially-known signature. Yes this gets specialised when you provide specific arguments -- but that's just well-understood polymorphism/unification.
I don't think I can usefully debate polymorphism, but my intent is that at the point of compilation every type signature is known precisely. There is no run-time polymorphism, nothing akin to overloading or dispatch.
There's a mess of confusion there. ad-hoc polymorphism aka overloading is (or rather can be, depending on your language's type system) precisely statically type-safe and able to support separate compilation.
Is that a question?
Here is an example, in a C++/C#/Java-like style.
r-value extend<args,fn,name>(r-value)
Ah, sorry didn't spot this before:
r-value
is already a term of art, vsl-value
, dating back to Strachey 1967, and as used in CPL/BCPL/C/C++. Can you find a different term?
You're right. It just slipped out. Will do.
This says that
extend
is a function that takes a relation value as an argument and returns a relation value as its result. This RA function is specialised by template/generic parameters as follows.
args
is an ordered list of zero or more attribute names, that specify the arguments for an invocation of the functionfn
fn
is the name of a function (the extend computation), with a type signature to matchargs
name
is the attribute name for the new computed value; its type will be that of the return value offn
- the heading of the r-value argument plus
name
provide the heading for the return value.I've worked through the full set of extended operators including aggregation, recursion and assignment. It all fits together nicely, see this.
A query (or update) written in this form can be interpreted live, or compiled into any modern high level language, TD or SQL. The target environment needs to supply a support library of callable functions. The compiler can perform type and heading inference, and guarantee safety.
This approach does not deal with other aspects of TTM, such as those that deal with features provided by the database, only with those parts that depend on a new language and new type system.
Does your approach give code that can be guaranteed statically type-safe? Does it support separate compilation, for example:
Yes.
That contradicts your earlier remarks, and turns out to be untrue, see below.
Why?
- In the library module (separately compiled to statically type-safe) a routine to take a relation with attributes
xCoord, yCoord
amongst others and return a relation with attributesrho, theta
instead.No. The library functions operate at the attribute type level, typically scalars. So there needs to be a
real rho(real x, real y)
function and areal theta(real x, real y)
function. If local, it will be called iteratively. If compiling to SQL, you need your SQL to provide those functions.
- In the client module, statically reject a call to this routine if the argument doesn't contain
xCoord, yCoord
or does containrho, theta
. Statically infer that the result doesn't includexCoord, yCoord
(even if it can't satisfy the presence of those attributes in the argument).The functions are bound statically, so this is a basic compile error.
- If the client module is itself a library in which the call to the routine must be polymorphic, pass on those polymorphism restrictions to the ultimate client; possibly adding extra restrictions (and/or satisfying some of the restrictions). All this statically type-safe, of course.
By "satisfying some" I mean for example the compiler can seexCoord, yCoord
OK, but can't satisfy the absence ofrho, theta
.Not applicable, I think. No polymorphism survives compilation. The compiler sees everything.
Polymorphism must survive separate compilation: the particular overload needed might not appear in this module, the the compiler can't see it. (That's another problem with templating: it tends to need monolithic program/module structures.)
You lost me. The specific intention is to provide a library of type signatures, and for the compiler to resolve function references based on actual types. Can you give me a for instance of what you describe?
Quote from Dave Voorhis on April 8, 2020, 2:54 pmQuote from dandl on April 8, 2020, 9:13 amQuote from Dave Voorhis on April 8, 2020, 6:59 amQuote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
I get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
I'm not sure I understand your last "Edit: ..." paragraph.
Quote from dandl on April 8, 2020, 9:13 amQuote from Dave Voorhis on April 8, 2020, 6:59 amQuote from dandl on April 8, 2020, 12:46 amQuote from Dave Voorhis on April 7, 2020, 4:28 pmDo you mean that that in your example, extend is a generic method in a distinct language that is transpiled -- say, by a preprocessor -- to code for a modern high level language, TD or SQL?
Yes. Or stand-alone, such as a query GUI, spreadsheet or dataflow language. The gold standard: JOIN a CSV file to a Web API, SQL engine supported but not required.
Or do you mean -- and this is what I understood until I read "... compiled into any modern high level language, TD or SQL" -- that extend is a generic method (I assume belonging to a class named R_Value or similar that represents an r-value, which I presume is a relation) defined in a modern high level language like C++, C# or Java (any of which should support extend as shown) and could internally either evaluate in its native language (i.e, C++, C#, or Java), or generate and execute TD or SQL code?
If it's the latter, I'm not clear how it's conceptually distinct from C# LINQ or Java Streams?
The
r-value
is part of the grammar, not an entity in its own right. Code written in this language as it stands would resemble LISP: lots of nested parentheses. I would prefer the real thing to be more LINQ-like, with left to right reading and connector dots.Yes, if it was compiled into a GP language, there would be some internal data type representing
r-value
, but it is not exposed. The only external connection is through pseudo-variables.Or is that precisely the idea, that it's meant to be conceptually the same approach as LINQ or Streams (and maybe even compatible with one or other other?) but based on a relational algebra rather than fold/map/filter/collect?
There is a piece missing, how it connects to a host language but no, I don't think it can be embedded like that. It needs its own compiler, and the compiler needs to read header (and function) definitions from somewhere. For external data sources like CSV and ODBC, the compiler needs to peek at the data source. For internal APIs, reflection does the trick, but you have to define and compile your data source before running the RA compiler.
That's what I'm doing with my as-yet unreleased Wrapd data abstraction layer. It has two phases, a development phase and a code generation phase. It requires that you specify the data-source connections (such as SQL queries, links to CSV files, etc.) in the development phase. The data-source connections are used to generate Java Streams -compatible tuple/record classes in the code generation phase.
So far, it works well, but it's not an implementation of the relational model. Instead, it's specifically intended to leverage access to Java Streams operators.
Though I may need to provide something akin to a relational model's JOIN operator, because the Streams assumption -- that JOIN is essentially unnecessary because instance-to-instance associations should be preexisting in class instances -- is workable but often restrictive.
I get the point, but that seems an odd omission. Join is foundational, AFAICT. LINQ has a serviceable join, although I never get it right the first time. How do you deal with joining two CSV tables, without join?
There are alternatives using explicit lookups on containers, or flatMap() combined with filter(), or third-party libraries like jOOλ and its innerJoin().
Philosophically, JOIN is neither foundational nor acceptably performant, particularly if you have to run the query more than once.
Instead, so the philosophy goes, you should be creating or maintaining an object graph with the appropriate relationships and querying that.
But my goal is specifically to resolve the type mismatch inherent in the RA if you try to treat relations as types (as TTM does). And to join CSV files.
Edit: re separate compilation, queries written in this language have dependencies on relation value headings and library functions that must be resolved by the compiler. Change the dependencies, recompile. But I expect the query compile and host language compile to be separate. Perhaps that's not what you meant.
I'm not sure I understand your last "Edit: ..." paragraph.