Java as a host language for a D
Quote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
Quote from Dave Voorhis on April 22, 2020, 2:02 pmQuote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
Quote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
Quote from dandl on April 23, 2020, 12:37 amQuote from Dave Voorhis on April 22, 2020, 2:02 pmQuote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
That's something I don't know much about. C# has nothing quite like it.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
Thanks.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
I understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Where do you hook in the code generation?
Quote from Dave Voorhis on April 22, 2020, 2:02 pmQuote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
That's something I don't know much about. C# has nothing quite like it.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
Thanks.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
I understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Where do you hook in the code generation?
Quote from Darren Duncan on April 23, 2020, 5:29 amTo be brief, I am using Java as a host language for my D language in progress right now.
To be brief, I am using Java as a host language for my D language in progress right now.
Quote from dandl on April 23, 2020, 7:02 amQuote from Darren Duncan on April 23, 2020, 5:29 amTo be brief, I am using Java as a host language for my D language in progress right now.
Too brief.
- What happened to Perl-alike?
- Do you have a design brief you can share?
Quote from Darren Duncan on April 23, 2020, 5:29 amTo be brief, I am using Java as a host language for my D language in progress right now.
Too brief.
- What happened to Perl-alike?
- Do you have a design brief you can share?
Quote from AntC on April 23, 2020, 9:58 amQuote from dandl on April 22, 2020, 11:17 am...
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
For comparison in a Haskell world:
- Within Haskell syntax, there's extensions for generics and program/type reflection; plus some compiler magic at compile time that reifies a type-level representation of your datatype; then statically at compile time your program uses standard type overloading to 'walk' the type structure. This all happens statically at compile time; no dynamic typing.
- Outside Haskell syntax, there's a fairly conventional pre-processor that's called by the compiler to mangle the source. (No guarantee the mangling will produce valid/type-safe Haskell. Default behaviour is as per C
#define, #if
directives to omit sourcefile lines.)- Outside Haskell syntax, but within the compiler there's a template/macro language. You can intermingle valid Haskell code with template definitions and macro-application. An important feature is that its macro functions are typed: there's a type representation for the Haskell language constructs; if a macro call appears in a place expecting a type signature, the macro must return a type signature. This (almost) guarantees the macro expansion is valid well-typed Haskell -- which takes a lot of pain out of diagnosing compiler errors thrown from (invisible) macro results.
- Building on the templating, there's a parser that will parse your sourcecode and pass a stream of tokens to the utilities. By default they just walk the AST and pass it on to the macro-expander doo-hicky. You can plug in filters that look for certain constructs that would be illegal and turn them into valid Haskell AST. This is particularly useful for taking syntactic sugar/shorthands and expanding into boilerplate.
Note that Haskell's lexical structure is very regular, with very few keywords or reserved symbols. Indeed all the familiar arithmetic operators count as 'user-defined' wrt the language standard; so it's very easy to parse sourcecode into a stream of tokens. I guess this would be really hard to emulate in Tutorial D with its large number of reserved words each with idiosyncratic syntax (JOIN
can be infix or prefix; the trailing{...}
for projection).These all assume your 'host language' (i.e. vanilla Haskell) is expressive enough and type-rich enough to support the semantics you want (i.e. D-ness).
Quote from dandl on April 22, 2020, 11:17 am...
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
For comparison in a Haskell world:
- Within Haskell syntax, there's extensions for generics and program/type reflection; plus some compiler magic at compile time that reifies a type-level representation of your datatype; then statically at compile time your program uses standard type overloading to 'walk' the type structure. This all happens statically at compile time; no dynamic typing.
- Outside Haskell syntax, there's a fairly conventional pre-processor that's called by the compiler to mangle the source. (No guarantee the mangling will produce valid/type-safe Haskell. Default behaviour is as per C
#define, #if
directives to omit sourcefile lines.) - Outside Haskell syntax, but within the compiler there's a template/macro language. You can intermingle valid Haskell code with template definitions and macro-application. An important feature is that its macro functions are typed: there's a type representation for the Haskell language constructs; if a macro call appears in a place expecting a type signature, the macro must return a type signature. This (almost) guarantees the macro expansion is valid well-typed Haskell -- which takes a lot of pain out of diagnosing compiler errors thrown from (invisible) macro results.
- Building on the templating, there's a parser that will parse your sourcecode and pass a stream of tokens to the utilities. By default they just walk the AST and pass it on to the macro-expander doo-hicky. You can plug in filters that look for certain constructs that would be illegal and turn them into valid Haskell AST. This is particularly useful for taking syntactic sugar/shorthands and expanding into boilerplate.
Note that Haskell's lexical structure is very regular, with very few keywords or reserved symbols. Indeed all the familiar arithmetic operators count as 'user-defined' wrt the language standard; so it's very easy to parse sourcecode into a stream of tokens. I guess this would be really hard to emulate in Tutorial D with its large number of reserved words each with idiosyncratic syntax (JOIN
can be infix or prefix; the trailing{...}
for projection).
These all assume your 'host language' (i.e. vanilla Haskell) is expressive enough and type-rich enough to support the semantics you want (i.e. D-ness).
Quote from Dave Voorhis on April 23, 2020, 10:56 amQuote from dandl on April 23, 2020, 12:37 amQuote from Dave Voorhis on April 22, 2020, 2:02 pmQuote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
That's something I don't know much about. C# has nothing quite like it.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
Thanks.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
I understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Yes, and I don't doubt that adding a few mechanisms to make it more RA-like -- like an explicit JOIN, INTERSECT, MINUS, etc. -- will be useful and I intend to add them later. My initial goal is to be able to easily use Java Streams on SQL, CSV, you-name-it sources.
Where do you hook in the code generation?
There are notionally three stages to development.
In Stage 1, for each relatively static data source -- like a SQL table, SQL query, CSV file, etc -- you write code to point to the data source and write a single test/builder that extracts data from it. Here's an example:
String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");A class called "TestSelect" will be generated by database.createTupleFromQueryAll(...), when it successfully executes in Stage 2, below. The call to createTupleFromQueryAll(...) would normally be put in an explicit unit test (which verifies that the query works, with a side-effect of generating the TestSelect tuple class) or an explicit build step (which generates the TestSelect tuple class, with a side-effect of verifying that the query works.)
In Stage 2, you run a method that executes all the tests from Stage 1. It uses the metadata from each test result to generate a corresponding record/tuple class definition.
In Stage 3, you create your application using the data sources from Stage 1 with the record/tuple classes from Stage 2. Like this:
database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));Note how the TestSelect class generated above is now referenced in database.query(...). The forEach Streams API method iterates over the collection of TestClass instances returned by database.query(...), and attributes of the instances can be accessed by conventional Java dotted-identifier notation.
This is notionally the same approach used by various Java ORMs and JAXB.
It's a long-winded build process if you use it manually, but my plan is that the above will be integrated into my work-in-progress datasheet environment, so the three stages will be invoked automagically and run as needed in the background whilst you interactively or programmatically choose and alter data sources.
Quote from dandl on April 23, 2020, 12:37 amQuote from Dave Voorhis on April 22, 2020, 2:02 pmQuote from dandl on April 22, 2020, 11:17 amI continue to ponder the question of using an existing GP language to satisfy the intent if not the letter of TTM, and what it would take to host a TTM-like type system in something like Java or C#.
It seems fairly obvious that no matter any other consideration, a D type system needs a system of value types. Attribute values need to be value types so they can be persisted to the database without pointers. Scalar types with components can be implemented using carefully written classes. This is perfectly do-able in Java, albeit with a fair bit of boilerplate. It's easier in C#, less boilerplate.
The other key requirement is tuple and relation types generated from a heading. Regular user-defined classes just will not serve the purpose, not least because the various RA algorithms would have to be individually coded for each different heading. But it's not too hard to design a data structure based on storing attributes as an array (rather than named fields) and implementing algorithms based on indexing into an array of data values.
But on top of this you need getters and a constructor for each individual tuple type. You need machinery to ensure that types with the same heading are the same type, and to extract intermediate tuple/relation values and create types for them. Yes, it can be done with reflection but this is never a good choice if performance is a concern. The alternative looks like boilerplate on steroids!
So this feeds back into my previous proposal about a language extension pre-compiler. The basic tasks to be done look straightfoward enough, perhaps in many ways similar to what programmers are already doing in writing data access routines.
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
I'd never heard of Telosys.
In the Java world, generation-like (emphasis on -like; it doesn't necessarily emit source code to the outside world) things are often done with the preprocessor-ish Java annotation facility. Probably the best-known example related to your post is Lombok, which automates creating standard boilerplate methods like getters and setters, equals, hashCode, etc.
That's something I don't know much about. C# has nothing quite like it.
The new Java record facility essentially makes Lombok capability built-in, so it will no longer require annotations (or Lombok), but related machinery like the Spring framework or the Hibernate ORM are very popular.
There is some outright code generation like JAXB, and the Maven build system explicitly supports a code generation phase as part of a build, but code generation is currently a tad out-of-fashion. Mention of JAXB and friends usually results in grumbling and curses from my colleagues (though to be fair, so does Lombok, Spring and Hibernate, but probably less grumbling) though I still see it in regular use.
Thanks.
My work-in-progress Wrapd library also uses code generation, where arbitrary data source metadata is used to generate tuple/record classes. The goal is to allow SQL queries, spreadsheets, CSV files, you-name-it to be easily used as input to Java Streams, thus eliminating a typical point of impedance mismatch without the undesirable weight of a full-blown ORM.
But it's not a relational algebra. It's the Streams API functional-programming-inspired (for lack of a better term) fold/filter/map/collect model.
I understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Yes, and I don't doubt that adding a few mechanisms to make it more RA-like -- like an explicit JOIN, INTERSECT, MINUS, etc. -- will be useful and I intend to add them later. My initial goal is to be able to easily use Java Streams on SQL, CSV, you-name-it sources.
Where do you hook in the code generation?
There are notionally three stages to development.
In Stage 1, for each relatively static data source -- like a SQL table, SQL query, CSV file, etc -- you write code to point to the data source and write a single test/builder that extracts data from it. Here's an example:
String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");
A class called "TestSelect" will be generated by database.createTupleFromQueryAll(...), when it successfully executes in Stage 2, below. The call to createTupleFromQueryAll(...) would normally be put in an explicit unit test (which verifies that the query works, with a side-effect of generating the TestSelect tuple class) or an explicit build step (which generates the TestSelect tuple class, with a side-effect of verifying that the query works.)
In Stage 2, you run a method that executes all the tests from Stage 1. It uses the metadata from each test result to generate a corresponding record/tuple class definition.
In Stage 3, you create your application using the data sources from Stage 1 with the record/tuple classes from Stage 2. Like this:
database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));
Note how the TestSelect class generated above is now referenced in database.query(...). The forEach Streams API method iterates over the collection of TestClass instances returned by database.query(...), and attributes of the instances can be accessed by conventional Java dotted-identifier notation.
This is notionally the same approach used by various Java ORMs and JAXB.
It's a long-winded build process if you use it manually, but my plan is that the above will be integrated into my work-in-progress datasheet environment, so the three stages will be invoked automagically and run as needed in the background whilst you interactively or programmatically choose and alter data sources.
Quote from dandl on April 23, 2020, 11:24 amQuote from AntC on April 23, 2020, 9:58 amQuote from dandl on April 22, 2020, 11:17 am...
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
For comparison in a Haskell world:
- Within Haskell syntax, there's extensions for generics and program/type reflection; plus some compiler magic at compile time that reifies a type-level representation of your datatype; then statically at compile time your program uses standard type overloading to 'walk' the type structure. This all happens statically at compile time; no dynamic typing.
- Outside Haskell syntax, there's a fairly conventional pre-processor that's called by the compiler to mangle the source. (No guarantee the mangling will produce valid/type-safe Haskell. Default behaviour is as per C
#define, #if
directives to omit sourcefile lines.)- Outside Haskell syntax, but within the compiler there's a template/macro language. You can intermingle valid Haskell code with template definitions and macro-application. An important feature is that its macro functions are typed: there's a type representation for the Haskell language constructs; if a macro call appears in a place expecting a type signature, the macro must return a type signature. This (almost) guarantees the macro expansion is valid well-typed Haskell -- which takes a lot of pain out of diagnosing compiler errors thrown from (invisible) macro results.
- Building on the templating, there's a parser that will parse your sourcecode and pass a stream of tokens to the utilities. By default they just walk the AST and pass it on to the macro-expander doo-hicky. You can plug in filters that look for certain constructs that would be illegal and turn them into valid Haskell AST. This is particularly useful for taking syntactic sugar/shorthands and expanding into boilerplate.
Note that Haskell's lexical structure is very regular, with very few keywords or reserved symbols. Indeed all the familiar arithmetic operators count as 'user-defined' wrt the language standard; so it's very easy to parse sourcecode into a stream of tokens. I guess this would be really hard to emulate in Tutorial D with its large number of reserved words each with idiosyncratic syntax (JOIN
can be infix or prefix; the trailing{...}
for projection).These all assume your 'host language' (i.e. vanilla Haskell) is expressive enough and type-rich enough to support the semantics you want (i.e. D-ness).
I guess my question is not how much good stuff there is behind the scenes or how many different ways there are to tackle the problem. I could go on about the C# CodeDom and runtime class creation, but that's not the point. The question is: how easy and familiar can you make a D-like language extension look to a Haskell native?
One key point that I'm now looking at differently: the nature of a tuple type. What if it isn't a record type?
In Haskell you get value types free with the rations, but record types are a challenge. So what if you don't use one? I've sketched out a design for a tuple type that is not a record type, but instead has an array of values, an array of attribute names, a factory method and a bunch of getters. Each tuple type has an associated relation type, and implementing RA algorithms with that structure is dead easy.
But creating one of these takes quite a bit of boilerplate code, and a mechanism to ensure that each tuple type (and any temporaries, and associated relation type) are unique as to the set of attribute names. Which is why the question about Java and code generation.
Would a similar approach work in Haskell?
Quote from AntC on April 23, 2020, 9:58 amQuote from dandl on April 22, 2020, 11:17 am...
So is a pre-compiler or code generator for data access a thing in the Java or C# world? I found Telosys, but it's hard to know whether this is a familiar approach to the working dev. Thoughts?
For comparison in a Haskell world:
- Within Haskell syntax, there's extensions for generics and program/type reflection; plus some compiler magic at compile time that reifies a type-level representation of your datatype; then statically at compile time your program uses standard type overloading to 'walk' the type structure. This all happens statically at compile time; no dynamic typing.
- Outside Haskell syntax, there's a fairly conventional pre-processor that's called by the compiler to mangle the source. (No guarantee the mangling will produce valid/type-safe Haskell. Default behaviour is as per C
#define, #if
directives to omit sourcefile lines.)- Outside Haskell syntax, but within the compiler there's a template/macro language. You can intermingle valid Haskell code with template definitions and macro-application. An important feature is that its macro functions are typed: there's a type representation for the Haskell language constructs; if a macro call appears in a place expecting a type signature, the macro must return a type signature. This (almost) guarantees the macro expansion is valid well-typed Haskell -- which takes a lot of pain out of diagnosing compiler errors thrown from (invisible) macro results.
- Building on the templating, there's a parser that will parse your sourcecode and pass a stream of tokens to the utilities. By default they just walk the AST and pass it on to the macro-expander doo-hicky. You can plug in filters that look for certain constructs that would be illegal and turn them into valid Haskell AST. This is particularly useful for taking syntactic sugar/shorthands and expanding into boilerplate.
Note that Haskell's lexical structure is very regular, with very few keywords or reserved symbols. Indeed all the familiar arithmetic operators count as 'user-defined' wrt the language standard; so it's very easy to parse sourcecode into a stream of tokens. I guess this would be really hard to emulate in Tutorial D with its large number of reserved words each with idiosyncratic syntax (JOIN
can be infix or prefix; the trailing{...}
for projection).These all assume your 'host language' (i.e. vanilla Haskell) is expressive enough and type-rich enough to support the semantics you want (i.e. D-ness).
I guess my question is not how much good stuff there is behind the scenes or how many different ways there are to tackle the problem. I could go on about the C# CodeDom and runtime class creation, but that's not the point. The question is: how easy and familiar can you make a D-like language extension look to a Haskell native?
One key point that I'm now looking at differently: the nature of a tuple type. What if it isn't a record type?
In Haskell you get value types free with the rations, but record types are a challenge. So what if you don't use one? I've sketched out a design for a tuple type that is not a record type, but instead has an array of values, an array of attribute names, a factory method and a bunch of getters. Each tuple type has an associated relation type, and implementing RA algorithms with that structure is dead easy.
But creating one of these takes quite a bit of boilerplate code, and a mechanism to ensure that each tuple type (and any temporaries, and associated relation type) are unique as to the set of attribute names. Which is why the question about Java and code generation.
Would a similar approach work in Haskell?
Quote from dandl on April 24, 2020, 1:30 amI understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Yes, and I don't doubt that adding a few mechanisms to make it more RA-like -- like an explicit JOIN, INTERSECT, MINUS, etc. -- will be useful and I intend to add them later. My initial goal is to be able to easily use Java Streams on SQL, CSV, you-name-it sources.
I'd be interested what your tuple classes look like, if you expect to be able to add RA features later.
Where do you hook in the code generation?
There are notionally three stages to development.
In Stage 1, for each relatively static data source -- like a SQL table, SQL query, CSV file, etc -- you write code to point to the data source and write a single test/builder that extracts data from it. Here's an example:
String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase;database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix);database.transact(xact -> {xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))");return true;});var tupleClassName = "TestSelect";database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");A class called "TestSelect" will be generated by database.createTupleFromQueryAll(...), when it successfully executes in Stage 2, below. The call to createTupleFromQueryAll(...) would normally be put in an explicit unit test (which verifies that the query works, with a side-effect of generating the TestSelect tuple class) or an explicit build step (which generates the TestSelect tuple class, with a side-effect of verifying that the query works.)
In Stage 2, you run a method that executes all the tests from Stage 1. It uses the metadata from each test result to generate a corresponding record/tuple class definition.
In Stage 3, you create your application using the data sources from Stage 1 with the record/tuple classes from Stage 2. Like this:
database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7).forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));Note how the TestSelect class generated above is now referenced in database.query(...). The forEach Streams API method iterates over the collection of TestClass instances returned by database.query(...), and attributes of the instances can be accessed by conventional Java dotted-identifier notation.
This is notionally the same approach used by various Java ORMs and JAXB.
It's a long-winded build process if you use it manually, but my plan is that the above will be integrated into my work-in-progress datasheet environment, so the three stages will be invoked automagically and run as needed in the background whilst you interactively or programmatically choose and alter data sources.
I don't really get it. Step 1 seems to infer attribute name and type from the result set of an SQL query. That won't work for CSV files, where you need type hints. And it doesn't address the issue of user-defined types in the database. Do you have user-defined value types?
Step 2 bundles together a number of these to generate code, presumably a complete class definition for each tuple type, which I assume you
import
into the application.Presumably in Step 3
database
assumes a standard factory method by which it can create a tuple type instance, given just the class name?I can read the Java code OK, but I can't easily infer the connections between the parts or the internal design decisions from the information given. And I can't see whether you can
The core questions seem to be:
- What are the key features of
- user-defined value types
- generated tuple types
- generated relation types (if you have them)?
- How do they compare with the TTM/D type system?
- Do they contain sufficient machinery to implement RA features on top?
I understand the philosophy, but it seems to me that if you generate tuple classes with enough internal machinery, implementing the RA is trivial.
I agree, I don't use Join a lot in LINQ, but I do use Select heavily and Union/Minus quite a bit, and they needs more or less the same machinery. Isn't that much what you're doing?
Yes, and I don't doubt that adding a few mechanisms to make it more RA-like -- like an explicit JOIN, INTERSECT, MINUS, etc. -- will be useful and I intend to add them later. My initial goal is to be able to easily use Java Streams on SQL, CSV, you-name-it sources.
I'd be interested what your tuple classes look like, if you expect to be able to add RA features later.
Where do you hook in the code generation?
There are notionally three stages to development.
In Stage 1, for each relatively static data source -- like a SQL table, SQL query, CSV file, etc -- you write code to point to the data source and write a single test/builder that extracts data from it. Here's an example:
String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase;database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix);database.transact(xact -> {xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))");return true;});var tupleClassName = "TestSelect";database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");String url = "jdbc:postgresql://" + dbServer + "/" + dbDatabase; database = new Database(url, dbUser, dbPasswd, dbTablenamePrefix); database.transact(xact -> { xact.updateAll("CREATE TABLE $$tester (x INTEGER, y INTEGER, PRIMARY KEY (x, y))"); return true; }); var tupleClassName = "TestSelect"; database.createTupleFromQueryAll(DatabaseConfigurationAndSetup.getCodeDirectory(), tupleClassName, "SELECT * FROM $$tester");A class called "TestSelect" will be generated by database.createTupleFromQueryAll(...), when it successfully executes in Stage 2, below. The call to createTupleFromQueryAll(...) would normally be put in an explicit unit test (which verifies that the query works, with a side-effect of generating the TestSelect tuple class) or an explicit build step (which generates the TestSelect tuple class, with a side-effect of verifying that the query works.)
In Stage 2, you run a method that executes all the tests from Stage 1. It uses the metadata from each test result to generate a corresponding record/tuple class definition.
In Stage 3, you create your application using the data sources from Stage 1 with the record/tuple classes from Stage 2. Like this:
database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7).forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));database.query("SELECT * FROM $$tester WHERE x > ? AND x < ?", TestSelect.class, 3, 7) .forEach(tuple -> System.out.println("[TEST] " + tuple.x + ", " + tuple.y));Note how the TestSelect class generated above is now referenced in database.query(...). The forEach Streams API method iterates over the collection of TestClass instances returned by database.query(...), and attributes of the instances can be accessed by conventional Java dotted-identifier notation.
This is notionally the same approach used by various Java ORMs and JAXB.
It's a long-winded build process if you use it manually, but my plan is that the above will be integrated into my work-in-progress datasheet environment, so the three stages will be invoked automagically and run as needed in the background whilst you interactively or programmatically choose and alter data sources.
I don't really get it. Step 1 seems to infer attribute name and type from the result set of an SQL query. That won't work for CSV files, where you need type hints. And it doesn't address the issue of user-defined types in the database. Do you have user-defined value types?
Step 2 bundles together a number of these to generate code, presumably a complete class definition for each tuple type, which I assume you import
into the application.
Presumably in Step 3database
assumes a standard factory method by which it can create a tuple type instance, given just the class name?
I can read the Java code OK, but I can't easily infer the connections between the parts or the internal design decisions from the information given. And I can't see whether you can
The core questions seem to be:
- What are the key features of
- user-defined value types
- generated tuple types
- generated relation types (if you have them)?
- How do they compare with the TTM/D type system?
- Do they contain sufficient machinery to implement RA features on top?
Quote from AntC on April 24, 2020, 5:38 amQuote from dandl on April 23, 2020, 11:24 amQuote from AntC on April 23, 2020, 9:58 amQuote from dandl on April 22, 2020, 11:17 am...
I guess my question is not how much good stuff there is behind the scenes or how many different ways there are to tackle the problem. I could go on about the C# CodeDom and runtime class creation, but that's not the point. The question is: how easy and familiar can you make a D-like language extension look to a Haskell native?
One key point that I'm now looking at differently: the nature of a tuple type. What if it isn't a record type?
Yes. Yes. Yes. If you manage to get Hugs going, you can play with H98 record types (positional) alongside Trex records (named only). They both have a PhysRep that's a vector of (pointers to) values, with compile-time translation from label name to position; but they're radically different.
In Haskell you get value types free with the rations, but record types are a challenge. So what if you don't use one? I've sketched out a design for a tuple type that is not a record type, but instead has an array of values, an array of attribute names, a factory method and a bunch of getters. Each tuple type has an associated relation type, and implementing RA algorithms with that structure is dead easy.
But creating one of these takes quite a bit of boilerplate code, and a mechanism to ensure that each tuple type (and any temporaries, and associated relation type) are unique as to the set of attribute names. Which is why the question about Java and code generation.
So what I think is wrong about H98 labelled fields and the Java records proposal is that the positional implementation grins through the abstraction. That means the compiler doesn't have a free hand with the PhysRep, and then the whole type machinery is tied to Sum-of-Products, in which Products behave positionally. So (loosely speaking) a record
MyRec(x = 7, y = True)
is not only not equal, but not even the same type asYourRec(y = True, x = 7)
.Would a similar approach work in Haskell?
Try Trex and see. Here's a demo using my somewhat-built repo (yay!)
Hugs> :load Hugs.Trex Hugs.Trex> (x = 7 :: Int, y = True) == (y = True, x = 7) -- records with same set of labelled fields True -- are same type, irrespective of position Hugs.Trex> show (y = True, x = 7) -- show is Haskell's toString "(x = 7, y = True)" -- label order is canonicalised Hugs.Trex> #x (y = True, x = 7) -- #x makes a function to extract at label x 7 Hugs.Trex> :t #x #x :: a\x => Rec (x :: b | a) -> b -- the (auto-inferred) type of #xTo explain that type signature for
#x
(this is the power, and there's a whole algebra of records and labels behind it):
a, b
are type variables, so this function is polymorphic.Rec( )
is a 'type generator', same asTUP{ }
in terms of TTM.(x :: b | a)
says: I am an argument toRec
containing a labelx
at some typeb
, and possibly other fields, call thema
.- The
-> b
says this function takes aRec ( )
argument and produces a value typeb
.- The
a\x =>
is a restriction on 'leftovers' ina
: they can't contain labelx
.This is valuable for making more complex expressions/functions to (say) extend a record or merge/join two records.Why did I use the strange wording "argument to
Rec
", not just say "a type"? Because the argument toRec
is not a type in the TTM sense of set-of-values. There are no values of type(x :: b | a)
; there are values of typeRec (x :: b | a)
--(y = True, x = 7)
is an example.Trex's term for
(x :: b | a)
is to call it a 'Row' type; you can think of it being in a different namespace; the technical description is: a ' Row' and labelx
are types of a different 'kind' from ordinary set-of-values types. In particular, thex
-as-label is distinct from anyx
-as-term-variable ora, b
-as-term-type-variables. You can see that in the function definition for#x
-- in fact#x
is just sugar for:
#x =
df λ(x = x | r) → x;
-- λ is a lambda-expressionThe lambda binding (i.e. pattern matching) expression
(x = x | r)
says: bind the value at labelx
to variablex
(which is in a different namespace)|
bind the record 'leftover' (if any) to variabler
.r
's type is an ordinary value-carrying record, signaturer :: a\x => Rec a
-- that is, its type isRec a
in whicha
is a 'Row' variable.Then here's a somewhat-polymorphic record extension function:
Hugs.Trex> let extend_with_x x r = (x = x | r) in extend_with_x 7 (y = True) (x = 7, y = True) -- inferred type: extend_with_x :: a\x => b -> Rec a -> Rec (x :: b | a)Note that type was inferred by the compiler, I didn't give any signature -- but in more complex polymorphic code, you'd probably need to.
I said "somewhat-polymorphic" because label name
x
must be hard-coded -- that is, it's a label constant or literal. The algebra behind Trex also allows for label-variables, but the implementation didn't get that far.
Quote from dandl on April 23, 2020, 11:24 amQuote from AntC on April 23, 2020, 9:58 amQuote from dandl on April 22, 2020, 11:17 am...
I guess my question is not how much good stuff there is behind the scenes or how many different ways there are to tackle the problem. I could go on about the C# CodeDom and runtime class creation, but that's not the point. The question is: how easy and familiar can you make a D-like language extension look to a Haskell native?
One key point that I'm now looking at differently: the nature of a tuple type. What if it isn't a record type?
Yes. Yes. Yes. If you manage to get Hugs going, you can play with H98 record types (positional) alongside Trex records (named only). They both have a PhysRep that's a vector of (pointers to) values, with compile-time translation from label name to position; but they're radically different.
In Haskell you get value types free with the rations, but record types are a challenge. So what if you don't use one? I've sketched out a design for a tuple type that is not a record type, but instead has an array of values, an array of attribute names, a factory method and a bunch of getters. Each tuple type has an associated relation type, and implementing RA algorithms with that structure is dead easy.
But creating one of these takes quite a bit of boilerplate code, and a mechanism to ensure that each tuple type (and any temporaries, and associated relation type) are unique as to the set of attribute names. Which is why the question about Java and code generation.
So what I think is wrong about H98 labelled fields and the Java records proposal is that the positional implementation grins through the abstraction. That means the compiler doesn't have a free hand with the PhysRep, and then the whole type machinery is tied to Sum-of-Products, in which Products behave positionally. So (loosely speaking) a record MyRec(x = 7, y = True)
is not only not equal, but not even the same type as YourRec(y = True, x = 7)
.
Would a similar approach work in Haskell?
Try Trex and see. Here's a demo using my somewhat-built repo (yay!)
Hugs> :load Hugs.Trex Hugs.Trex> (x = 7 :: Int, y = True) == (y = True, x = 7) -- records with same set of labelled fields True -- are same type, irrespective of position Hugs.Trex> show (y = True, x = 7) -- show is Haskell's toString "(x = 7, y = True)" -- label order is canonicalised Hugs.Trex> #x (y = True, x = 7) -- #x makes a function to extract at label x 7 Hugs.Trex> :t #x #x :: a\x => Rec (x :: b | a) -> b -- the (auto-inferred) type of #x
To explain that type signature for #x
(this is the power, and there's a whole algebra of records and labels behind it):
a, b
are type variables, so this function is polymorphic.Rec( )
is a 'type generator', same asTUP{ }
in terms of TTM.(x :: b | a)
says: I am an argument toRec
containing a labelx
at some typeb
, and possibly other fields, call thema
.- The
-> b
says this function takes aRec ( )
argument and produces a value typeb
. - The
a\x =>
is a restriction on 'leftovers' ina
: they can't contain labelx
.This is valuable for making more complex expressions/functions to (say) extend a record or merge/join two records.
Why did I use the strange wording "argument to Rec
", not just say "a type"? Because the argument to Rec
is not a type in the TTM sense of set-of-values. There are no values of type (x :: b | a)
; there are values of type Rec (x :: b | a)
-- (y = True, x = 7)
is an example.
Trex's term for (x :: b | a)
is to call it a 'Row' type; you can think of it being in a different namespace; the technical description is: a ' Row' and label x
are types of a different 'kind' from ordinary set-of-values types. In particular, the x
-as-label is distinct from any x
-as-term-variable or a, b
-as-term-type-variables. You can see that in the function definition for #x
-- in fact #x
is just sugar for:
#x =
df λ(x = x | r) → x;
-- λ is a lambda-expression
The lambda binding (i.e. pattern matching) expression (x = x | r)
says: bind the value at label x
to variable x
(which is in a different namespace) |
bind the record 'leftover' (if any) to variable r
. r
's type is an ordinary value-carrying record, signature r :: a\x => Rec a
-- that is, its type is Rec a
in which a
is a 'Row' variable.
Then here's a somewhat-polymorphic record extension function:
Hugs.Trex> let extend_with_x x r = (x = x | r) in extend_with_x 7 (y = True) (x = 7, y = True) -- inferred type: extend_with_x :: a\x => b -> Rec a -> Rec (x :: b | a)
Note that type was inferred by the compiler, I didn't give any signature -- but in more complex polymorphic code, you'd probably need to.
I said "somewhat-polymorphic" because label name x
must be hard-coded -- that is, it's a label constant or literal. The algebra behind Trex also allows for label-variables, but the implementation didn't get that far.